在我的生活中,我不知道我的PySpark安装有什么问题。我已经安装了所有的依赖项,包括Hadoop,但是PySpark找不到它--我是否正确地诊断了它?
请参阅下面的完整错误消息,但它最终在PySpark SQL上失败。
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':"
nickeleres@Nicks-MBP:~$ pyspark
Python 2.7.10 (default, Feb 7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/opt/spark-2.2.0/jars/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
17/10/24 21:21:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/10/24 21:21:59 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/10/24 21:21:59 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
17/10/24 21:21:59 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
Traceback (most recent call last):
File "/opt/spark/python/pyspark/shell.py", line 45, in <module>
spark = SparkSession.builder\
File "/opt/spark/python/pyspark/sql/session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':"
>>> 发布于 2017-10-26 22:07:59
tl;博士关闭了所有其他的火花过程,重新开始。
下面的警告消息指出,还有另一个进程(或多个进程)保存端口。
我确信这个过程(Es)是星火过程,例如火花放电会话或星火应用程序。
17/10/24 21:21:59 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/10/24 21:21:59 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
17/10/24 21:21:59 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.这就是为什么在火花/火花公子发现端口4044可以免费用于web之后,它试图实例化HiveSessionStateBuilder,但失败了。
由于不能有多个使用相同的本地蜂巢亚稳态的星火应用程序启动和运行,所以火花放电失败了。
发布于 2018-11-14 11:04:23
为什么会发生这种事?
因为我们不止一次地尝试创建新的会话!在jupyter笔记本浏览器的不同标签上。
解决办法:
在JUPYTER笔记本上启动单个TAB的新会话,避免在不同选项卡上创建新会话
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('EXAMPLE').getOrCreate()发布于 2021-02-09 20:03:13
我们在尝试使用木星笔记本创建火花会话时也收到了同样的错误。我们注意到,在我们的中,用户没有权限触发目录,即针对以下spark属性值"spark.local.dir“使用的目录。我们更改了目录的权限,使用户能够完全访问该目录并解决问题。通常,这个目录驻留在类似于“/tmp/user”的上。
请注意,根据火花文档 space目录是一个“目录”,用于Spark中的" scratch“空间,包括存储在磁盘上的映射输出文件和RDD。这应该位于系统中的快速本地磁盘上。它也可以是不同磁盘上多个目录的逗号分隔列表”。
https://stackoverflow.com/questions/46924010
复制相似问题