我正在尝试通过spark structured streaming从Kafka中读取数据。但是,在Spark 2.4.0.中,您不能为流设置组id (参见How to set group.id for consumer group in kafka data source in Structured Streaming?)。 然而,由于没有设置,spark只是生成组Id,而我停留在GroupAuthorizationException: 19/12/10 15:15:00 ERROR streaming.MicroBatchExecution: Query [id = 747090ff-12
我正在用scala编写一个与kafka进行火花流连接的程序,我得到了以下错误:
18/02/19 12:31:39 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 39)
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {prensa4-0=744}
at org.apache.kafka.clients.
我正在尝试对消费者群体进行实验。
这是我的代码片段
public final class App {
private static final int INTERVAL = 5000;
public static void main(String[] args) throws Exception {
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "xxx:9092");
kafka
我正在尝试使用window on structured与spark和kafka。我在非基于时间的数据上使用window,所以我得到了这个错误:
'Non-time-based windows are not supported on streaming DataFrames/Datasets;;\nWindow
下面是我的代码:
window = Window.partitionBy("input_id").orderBy("similarity")
outputDf = inputDf\
.crossJoin(ticketDf.with
当我从Kafka主题创建一个流并打印它的内容时
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 pyspark-shell'
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka impo