我的要求是从现有的PySpark程序中调用一个"SparkScala“函数。将PySpark程序中创建的sparkSession传递给Scala函数的最佳方式是什么?我将我的scala jar传递给Pyspark,如下所示。= SparkSession \.appName("PySpark using Scala ex
::java.lang.IllegalArgumentException:在实例化‘org.apache.spk.sql.hive.HiveSessionState’时出错:在org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) at org.apache.spark.sql.SparkSession.sessionState$lzycompu
在SQL server上,我必须选择每年出现次数最多的月份,并按从高到低的顺序进行排序。GROUP BY year, month ORDER BY ROW_NUMBER() OVER(PARTITION BY year ORDER BY COUNT(day) DESC) limit 1 但是在pyspark$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(