Scroll indicator done
728x90

mkdir tmp/hive -> C: 에 설정

spark/conf/log4j ->log4j.rootCategory=ERROR 라고 수정

pyspark -> java 11 환경에서만 가능

PYSPARK_PYTHON 환경변수 설정 (C:\Users\jsl11\AppData\Local\Programs\Python\Python37\python.exe)

환경변수 설정 목록 : PYSPARK_PYTHON, HADOOP_HOME, JAVA_HOME, SPARK_HOME


word_count.text 예제 실행

rdd=sc.textFile("README.md")

rdd.count()


>> Spyder 로 작업

- word-count.py

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("wordCount")
sc = SparkContext(conf = conf)

input = sc.textFile("file:///C:/Users/Administrator/SparkCourse/in/word_count.text")
words = input.flatMap(lambda x : x.split())
wordCounts = words.countByValue()

for word, count in wordCounts.items():
    cleanWord = word.encode("ascii", "ignore")
    
    
    if(cleanWord):
    	print(cleanWord.decode() + " " + str(count))
...
ties 2
Midwest 1
settlements 1
Great 3
Lakes. 1
Due 1
City's 1
South, 1
there 1
numerous 1
southern 2
sympathizers 1
early 1
days 1
Civil 1
War 2
mayor 1

 

728x90