首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Spark with Cassandra:无法注册spark.kryo.registrator

Spark with Cassandra:无法注册spark.kryo.registrator
EN

Stack Overflow用户
提问于 2013-10-28 15:40:03
回答 1查看 1.3K关注 0票数 2

目前,当我尝试在独立模式下使用Cassandra运行Spark时,我遇到了一些问题。

最初,我在SparkContext中使用参数mater="local4“成功运行。

然后,我尝试切换到独立模式。我使用的是:

Ubuntu: 12.04卡桑德拉: 1.2.11 Spark: 0.8.0 Scala: 2.9.3JDK: Oracle 1.6.0_35 Kryo: 2.21

首先,我得到了“未读区块”错误。作为其他主题中的建议,我改为使用Kryo序列化程序并添加Twitter Chill。然后,我在控制台得到“注册spark.kryo.registrator失败”和异常,如下所示:

代码语言:javascript
运行
复制
13/10/28 12:12:36 INFO cluster.ClusterTaskSetManager: Lost TID 0 (task 0.0:0)
13/10/28 12:12:36 INFO cluster.ClusterTaskSetManager: Loss was due to java.io.EOFException
java.io.EOFException
    at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:109)
    at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:150)
    at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.readObject(Unknown Source)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:435)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
    at java.io.ObjectInputStream.readSerialData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.readObject(Unknown Source)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
    at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
    at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:129)
    at java.io.ObjectInputStream.readExternalData(Unknown Source)
    at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
    at java.io.ObjectInputStream.readObject0(Unknown Source)
    at java.io.ObjectInputStream.readObject(Unknown Source)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:153)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

之前也有人在spark中遇到过EOFException,答案是注册器注册不正确。我按照Spark指南注册注册器。注册器如下:

代码语言:javascript
运行
复制
    class MyRegistrator extends KryoRegistrator {
        override def registerClasses(kryo: Kryo) {
            kryo.register(classOf[org.apache.spark.rdd.RDD[(Map[String, ByteBuffer], Map[String, ByteBuffer])]])
            kryo.register(classOf[String], 1)
            kryo.register(classOf[Map[String, ByteBuffer]], 2)
        }
    }

并且我还按照指南所做的那样设置了属性。

代码语言:javascript
运行
复制
    System.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    System.setProperty("spark.kryo.registrator", "main.scala.MyRegistrator")

谁能给我一些提示,我哪里做错了?谢谢。

EN

回答 1

Stack Overflow用户

发布于 2013-11-05 12:01:49

根据我的经验,获取"EOFException“和"data unread block”的原因是相同的。它们在集群上运行时缺少一些库。最重要的是,我在spark中添加了带有"sbt assembly“的库,这些库实际上已经存在于jars文件夹中。但是spark仍然不能成功地找到并加载它们。然后我在spark上下文中添加这些库,它就可以工作了。这意味着我需要通过在代码中指定来将库传输到每个节点。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/19629360

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档