按照来自的教程,在HDP中启动一个自定义版本的线上火花很好,如下所示:
# download a current headless version of spark
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export HADOOP_CONF_DIR=/usr/hdp/current/spark2-client/conf
export SPARK_HOME=<<path/to>>/spark-2.4.3-bin-without-hadoop/
<<path/to>>/spark-2.4.3-
Spark Thrift服务器试图在通过JDBC传输之前将完整的数据集加载到内存中,在JDBC客户端上,我收到错误:
SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
org.apache.spark.SparkException: Job aborted due to stage