首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >火花流(运行示例与火花提交)

火花流(运行示例与火花提交)
EN

Stack Overflow用户
提问于 2015-11-08 08:13:58
回答 1查看 1.1K关注 0票数 0

这是Spark with Flume (configuration/classpath?)的后续问题

在尝试了几件事情之后,我发现了这个问题,现在的问题是

$spark提交--jars /opt/scala/spark-streaming-flume_2.10-1.5.1.jar --主控本地* /home/user/spark/FlumeStreaming.py

代码语言:javascript
运行
复制
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.flume import FlumeUtils

sc = SparkContext(appName="Newapp")
strm = StreamingContext(sc,1)

flume = FlumeUtils.createStream(strm,"localhost",9999)

lines = flume.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
        .map(lambda word: (word, 1)) \
        .reduceByKey(lambda a, b: a+b)
counts.pprint()

strm.start()
strm.awaitTermination()
代码语言:javascript
运行
复制
15/11/07 23:55:09 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NoClassDefFoundError: org/apache/flume/source/avro/AvroSourceProtocol
        at org.apache.spark.streaming.flume.FlumeReceiver.responder$lzycompute(FlumeInputDStream.scala:147)
        at org.apache.spark.streaming.flume.FlumeReceiver.responder(FlumeInputDStream.scala:146)
        at org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:163)
        at org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:170)
        at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
        at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
        at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
        at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
        at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
        at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: org.apache.flume.source.avro.AvroSourceProtocol
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 16 more

15/11/07 23:55:09 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
15/11/07 23:55:09 ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver

我的问题是,它与spark示例中提供的flume_wordcount.py代码相同,示例中的代码运行良好,但我的版本不起作用。不同之处在于它使用运行示例运行的方式--一个使用运行示例,另一个使用火花提交,这指向了类路径和jar文件的管理方式。有什么我该做的吗?

EN

回答 1

Stack Overflow用户

发布于 2015-11-09 04:15:10

解决了这个问题,必须有正确的jar并将其传递给spark submit。

代码语言:javascript
运行
复制
$spark-submit --jars /path/to/spark-streaming-flume-assembly*.jar FlumeStreaming.py localhost 12345
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33591976

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档