我的pyspark脚本是它包含的m.py
l = [1,2,3,4,7,5,6,7,8,9,0]
k = sc.parallelize(l)
type(k)
当我使用spark时-提交m.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
Traceback (most recent call last):
File "/root/m.py", line 3, in <module>
k = sc.parallelize(l)
NameError: name 'sc' is not defined
有没有办法在pyspark-shell之外运行脚本?我卡住了?
同样,当我启动pyspark,然后输入:
import m
错误再次出现:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "m.py", line 3, in <module>
k = sc.parallelize(l)
NameError: name 'sc' is not defined
发布于 2018-05-11 18:00:02
在您的驱动程序中,请确保首先创建一个sparkContext变量。正如我所看到的,您直接使用了'sc',而没有对其进行初始化。然后你就可以运行你的程序了:
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
import m.py
https://stackoverflow.com/questions/50289566
复制相似问题