Spark Streaming Failed to read checkpoint from directory ...现象解决方案及原因

现象

使用spark-submit提交一个Spark Streaming Application至yarn集群, 报错

Caused by: java.lang.ClassNotFoundException: XXXStartup$$anonfun$9
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:266)
    at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:623)
    at org.apache.spark.streaming.ObjectInputStreamWithLoader.resolveClass(Checkpoint.scala:286)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1704)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1704)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:499)
    at org.apache.spark.streaming.DStreamGraph$$anonfun$readObject$1.apply$mcV$sp(DStreamGraph.scala:188)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1138)
    ... 31 more
Exception in thread "main" org.apache.spark.SparkException: Failed to read checkpoint from directory XXX_startup
    at org.apache.spark.streaming.CheckpointReader$.read(Checkpoint.scala:272)
    at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:624)
    at XXXStartup$.main(XXXStartup.scala:79)
    at XXXStartup.main(XXXStartup.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

解决方案及原因

StreamingContext是这样创建的:

    val createStreamingContext = (checkPointDir: String) => {
      val sparkConf = new SparkConf().setAppName(topic)
      val sparkContext = new SparkContext(sparkConf)
      @transient val streamingContext = new StreamingContext(sparkContext, Seconds(args(1).toInt))
      streamingContext.checkpoint(checkPointDir)
      streamingContext
    }
    val checkPointDir = AppConf.strCheckPointPrefix + topic
    val streamingContext = StreamingContext.getOrCreate(checkPointDir, () => createStreamingContext(checkPointDir))

重新编译application jar包之后, 再次提交app之前没有清除checkpoint目录下已经存在的之前的application生成的checkpoint文件导致. 清除之后再提交即可


本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏Spark生态圈

[spark] DAGScheduler划分stage源码解析

Spark Application只有遇到action操作时才会真正的提交任务并进行计算,DAGScheduler 会根据各个RDD之间的依赖关系形成一个DAG...

25120
来自专栏大数据学习笔记

Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.8 HQL:排序)(草稿)

第11章 Hive:SQL on Hadoop 11.8 HQL:排序 11.8.1 order by Hive 中的 order by与SQL 中的order...

25270
来自专栏写代码的海盗

spark在yarn-cluster上面执行报错

在单机模式下执行成功的spark程序,在yarn上面就报错。异常信息如下: 1 14/08/14 02:05:42 INFO DAGScheduler: Co...

35350
来自专栏个人分享

spark1.4加载mysql数据 创建Dataframe及join操作连接方法问题

最后无奈。。就用原来的方法 创建软连接,加载数据,发现可以。。这我就不明白了。。。

17720
来自专栏别先生

Spark的Streaming和Spark的SQL简单入门学习

39990
来自专栏SnailTyan

枚举——完美立方

1. 枚举 枚举是基于逐个尝试答案的一种问题求解策略。 2. 完美立方 形如a3=b3+c3+d3a^3 = b^3 + c^3 + d^3的等式被称为完美立方...

37500
来自专栏牛肉圆粉不加葱

配置hadoop集群namenode的hostname千万不要包含下划线

在部署hadoop集群时,core-site.xml中的fs.defaultFS项的value不可包含下划线,否则会报以下错误

11530
来自专栏小樱的经验随笔

【BZOJ】初级水题列表——献给那些想要进军BZOJ的OIers(自用,怕荒废了最后的六月考试月,刷刷水题,水水更健康)

BZOJ初级水题列表——献给那些想要进军BZOJ的OIers 代码长度解释一切! 注:以下代码描述均为C++ RunID User Problem Res...

51290
来自专栏牛肉圆粉不加葱

举例说明Spark RDD的分区、依赖

从输出我们可以看出,对于任意一个RDD x来说,其dependencies代表了其直接依赖的RDDs(一个或多个)。那dependencies又是怎么能够表明R...

8610
来自专栏祝威廉

Spark Streaming Dynamic Resource Allocation

DRA has already been implemented since Spark 1.2 . However the existing Spark DR...

25030

扫码关注云+社区

领取腾讯云代金券