这是Spark with Flume (configuration/classpath?)的后续问题
在尝试了几件事情之后,我发现了这个问题,现在的问题是
$spark提交--jars /opt/scala/spark-streaming-flume_2.10-1.5.1.jar --主控本地* /home/user/spark/FlumeStreaming.py
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.flume import FlumeUtils
sc = SparkContext(appName="Newapp")
strm = StreamingContext(sc,1)
flume = FlumeUtils.createStream(strm,"localhost",9999)
lines = flume.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint()
strm.start()
strm.awaitTermination()15/11/07 23:55:09 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NoClassDefFoundError: org/apache/flume/source/avro/AvroSourceProtocol
at org.apache.spark.streaming.flume.FlumeReceiver.responder$lzycompute(FlumeInputDStream.scala:147)
at org.apache.spark.streaming.flume.FlumeReceiver.responder(FlumeInputDStream.scala:146)
at org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:163)
at org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:170)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: org.apache.flume.source.avro.AvroSourceProtocol
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 16 more
15/11/07 23:55:09 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
15/11/07 23:55:09 ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver我的问题是,它与spark示例中提供的flume_wordcount.py代码相同,示例中的代码运行良好,但我的版本不起作用。不同之处在于它使用运行示例运行的方式--一个使用运行示例,另一个使用火花提交,这指向了类路径和jar文件的管理方式。有什么我该做的吗?
发布于 2015-11-09 04:15:10
解决了这个问题,必须有正确的jar并将其传递给spark submit。
$spark-submit --jars /path/to/spark-streaming-flume-assembly*.jar FlumeStreaming.py localhost 12345https://stackoverflow.com/questions/33591976
复制相似问题