我正在使用Dataproc云进行火花计算。问题是我的工作节点无法访问textblob包。我怎么才能修好它?我在jupyter笔记本上用火花放电内核编写代码
代码错误:
PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 588, in main
func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 447, in read_udfs
udfs.append(read_single_udf(pickleSer, infile, eval_type, runner_conf, udf_index=i))
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 249, in read_single_udf
f, return_type = read_command(pickleSer, infile)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 69, in read_command
command = serializer._read_with_length(file)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'textblob'
失败的示例代码:
data = [{"Category": 'Aaaa'},
{"Category": 'Bbbb'},
{"Category": 'Cccc'},
{"Category": 'Eeeee'}
]
df = spark.createDataFrame(data)
def sentPackage(text):
import textblob
return TextBlob(text).sentiment.polarity
sentPackageUDF = udf(sentPackage, StringType(), )
df = df.withColumn("polarity", sentPackageUDF(f.col("Category")))
df.show()
发布于 2022-01-12 15:56:28
关键是定义一个将发送给工作人员的函数,并在其中导入textblob。
def function_to_be_executed_by_workers(...):
import textblob
# use textblob and perform operations on data
https://stackoverflow.com/questions/70683090
复制相似问题