我有一个简单的Apache Spark App,我从hdfs读取文件,然后通过管道将其发送到外部进程。当我读取大量数据(在我的例子中,文件大约有241MB),并且我没有指定最小分区数或指定最小分区数为4时,我得到以下错误:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.