我在pycharm上运行以下代码,如果我通过命令提示符提供--jars,则此代码工作正常
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("pySparksqLite_test").\
config('spark.jars.packages', "C:/jars/DataVisualization/sqlite-jdbc-3.20.0.jar").getOrCreate()
spark.conf.set("spark.sql.shuffle.partitions", "5")
df_flight_info = spark.read.format("jdbc").option(url="jdbc:sqlite:C:/sqlite-tools-win32-x86-3290000/my-sqlite.db",
driver="org.sqlite.JDBC",
dbtable="(select DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count from flight_info)")\
.load()但是有了pycharm我就不会犯错了
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: C:/Users/jars/sqlite-jdbc-3.20.0.jar
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1000)
at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:998)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.deploy.SparkSubmitUtils$.extractMavenCoordinates(SparkSubmit.scala:998)
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1220)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:49)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:350)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
File "C:/...../proj1/pySparksqLite.py", line 4, in <module>
config('spark.jars.packages', "C:/Users/jars/sqlite-jdbc-3.20.0.jar").getOrCreate()
File "C:\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\context.py", line 331, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\context.py", line 280, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\java_gateway.py", line 95, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
Process finished with exit code 1我还尝试通过环境变量提供jar文件路径,并通过os设置它。
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars C:/Users/jars/sqlite-jdbc-3.27.2.jar'但即使这样也行不通。
发布于 2019-08-18 06:57:38
相当于--jars提交参数的是spark.jars,它允许您指定本地jars以将其传输到集群。您使用了spark.jars.packages,它允许您通过指定maven坐标从maven下载包。与此等效的提交参数是--packages。
有关更多信息,请查看文档:configuration和submitting parameters
https://stackoverflow.com/questions/57535134
复制相似问题