我正在尝试通过spark structured streaming从Kafka中读取数据。但是,在Spark 2.4.0.中,您不能为流设置组id (参见How to set group.id for consumer group in kafka data source in Structured Streaming?)。 然而,由于没有设置,spark只是生成组Id,而我停留在GroupAuthorizationException: 19/12/10 15:15:00 ERROR streaming.MicroBatchExecution: Query [id = 747090ff-12
我一直收到这个错误消息:
The message is 1169350 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.
正如在其他StackOverflow帖子中所指出的,我正在尝试在生产者中设置“max.request.size”配置,如下所示:
.writeStream
.format("kafka")
.option(
"kafka.bootstrap.
我试着用spark连接到kafka的话题。它不会读取数据流中的任何数据,也不会产生任何错误。下面是我的jupyter代码:
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 pyspark-shell'
from pretty import pprint
from pyspark import SparkContext
from pyspark.streaming import Stream
当我试图从spark (使用Java)流到secured (使用SASL明文机制)时,我得到了这个错误。
更详细的错误消息:
17/07/07 14:38:43 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from a channel, the socket has likely been closed.
Exception in thread "main" org.apache.spark.SparkException: j
项目设置:
1生产者-序列化对象&向Kafka发送字节。
1火花使用者-应该在DefaultDecoder包中使用kafka.serializer来消耗字节。
发行:
SBT导入正确的库(kafka-客户端+ kafka_2.10),但无法在kafka_2.10 jar中找到任何类。
它似乎是在错误的路径下搜索(org.apache.spark.streaming.kafka而不是org.apache.kafka)。
错误消息:
object serializer is not a member of package org.apache.spa