Spark Streaming DirectAPI中“spark.streaming.blockInterval”的用途是什么

I want to understand, What role "spark.streaming.blockInterval" plays in Spark Streaming DirectAPI, as per my understanding "spark.streaming.blockInterval" is used for calculating partitions i.e. #partitions = (receivers x* batchInterval) /blockInterval, but in DirectAPI spark streaming partitions is equal to no. of kafka partitions.

我想明白,“spark.streaming.blockInterval”在Spark Streaming DirectAPI中起什么作用,根据我的理解,“spark.streaming.blockInterval”用于计算分区,即#partitions =(receiver x * batchInterval)/ blockInterval,但是在DirectAPI火花流分区中等于否。卡夫卡分区

How "spark.streaming.blockInterval" is used in DirectAPI ?

如何在DirectAPI中使用“spark.streaming.blockInterval”?

1 个解决方案

#1

spark.streaming.blockInterval :

Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark.

Spark Streaming接收器接收的数据在存储到Spark之前被分块为数据块的时间间隔。

And KafkaUtils.createDirectStream() do not use receiver.

而KafkaUtils.createDirectStream()不使用接收器。

With directStream, Spark Streaming will create as many RDD partitions as there are Kafka partitions to consume

使用directStream,Spark Streaming将创建与要使用的Kafka分区一样多的RDD分区

#1