编辑:答案:是一个JAR文件造成了冲突!相关的帖子是:
做以下工作:
val numOfProcessors:Int = 2
val filePath:java.lang.String = "s3n://somefile.csv"
var rdd:org.apache.spark.rdd.RDD[java.lang.String] = sc.textFile(filePath, numOfProcessors)
我得到了
error: type mismatch;
found : org.apache.spark.rdd.org.apache.spark.rdd.
这个例子有什么问题?
val f = sc.parallelize(Array((1,1),(1,2)))
val p = new org.apache.spark.rdd.PairRDDFunctions[Int,Int](f)
Name: Compile Error
Message: error: type mismatch;
found : org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[(Int, Int)]
在尝试朴素贝叶斯示例时,我在Ubuntu上的Spark1.4上遇到了这个问题。我见过有类似问题的帖子,修复的是jar不匹配(通过Maven),但在这种情况下,有问题的类是用Spark打包的,所以我不确定如何继续。
scala> val model = NaiveBayes.train(training, lambda = 1.0, modelType = "multinomial")
<console>:46: error: type mismatch;
found : org.apache.spark.rdd.org.apache.spark.rdd.
parallelize整数并尝试另存为文本文件,如下所示:
scala> val test = sc.parallelize(List(12,2,3,4))
test: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24
另存为文本文件
scala> test.saveAsTextFile("/test")
错误堆栈跟踪如下:
java.lang.NoSuchMethodError: org.apache.hadoop.mapre
我使用一个未来在RDD上执行一个阻塞操作,如下所示:
dStreams.foreach(_.foreachRDD { rdd =>
Future{ writeRDD(rdd) }
})
有时我会犯这样的错误:
org.apache.spark.SparkException: Job aborted due to stage failure: Task creation failed: org.apache.spark.SparkException: Attempted to use BlockRDD[820] at actorStream at Tests.scala:149 a
所以我创建了一个调用Python脚本并执行PySpark转换的作业。然而,当我从AWS Cloudwatch查看Output时,输出中有许多对我来说并不重要的信息。例如: at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:199)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:196)
at org.apache.spark.rdd.NewHadoopRDD.compute(New
我正在尝试将拥抱脸升级到我们目前的版本2.11。当我通过pip安装transformers=={任意版本}在azure笔记本中安装任何较新版本的转换器时,我在执行过程中会收到以下错误。我对此非常陌生,但是任何关于故障排除方法的反馈都将不胜感激。谢谢。
org.apache.spark.SparkException: Cloned Python environment not found at /local_disk0/.ephemeral_nfs/envs/pythonEnv-89bc8046-d7ae-4968-b280-fc233a9bf3e4
at org.apache.spark.ap
有时我在发射火花弹时遇到了下面的错误,它有时发生,而不是每次都发生.为什么?
[smile@10-149-11-158 ~]$ cd /data/slot0/spark
[smile@10-149-11-158 spark]$ ./bin/spark-shell --master spark://10-149-11-157:7077
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initiali
当我将pyspark保存为parquet文件时,我得到了以下错误:
Py4JJavaError: An error occurred while calling o50.parquet.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:224)
at org.apache.spark.sql.execution.dataso
我正在使用Python 3.5和Spark 2.2流媒体与Kafka,脚本无法运行,因为缺少kafka库。
我不明白为什么这个库丢失了/没有找到,尽管依赖信息来自Spark的网站本身。
groupId = org.apache.spark
artifactId = spark-streaming-kafka-0-10_2.11
version = 2.2.0
我运行了"spark-submit script.py“,错误显示kafka库是必需的。
Spark Streaming's Kafka libraries not found in class path. Try one