I am using Spark to read the text files from a folder and load them to hive.
我正在使用Spark从文件夹中读取文本文件并将其加载到配置单元。
The interval for the spark streaming is 1 min. The source folder may have 1000 files of bigger size in rare cases.
火花流的间隔是1分钟。在极少数情况下,源文件夹可能有1000个更大的文件。
How do i control spark streaming to limit the number of files the program reads? Currently my program is reading all files generated in last 1 min. But i want to control the number of files it's reading.
如何控制火花流以限制程序读取的文件数量?目前我的程序正在读取最近1分钟内生成的所有文件。但我想控制它正在阅读的文件数量。
I am using textFileStream API.
我正在使用textFileStream API。
JavaDStream<String> lines = jssc.textFileStream("C:/Users/abcd/files/");
Is there any way to control the file streaming rate?
有没有办法控制文件流速率?
2 个解决方案
#1
0
I am afraid not. Spark steaming is based on Time driven. You can use Flink which provide Data driven
恐怕没有。 Spark steaming基于时间驱动。您可以使用提供数据驱动的Flink
https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/programming-model.html#windows
#2
0
You could use "spark.streaming.backpressure.enabled" and "spark.streaming.backpressure.initialRate" for controlling the rate at which data is received!!!
您可以使用“spark.streaming.backpressure.enabled”和“spark.streaming.backpressure.initialRate”来控制接收数据的速率!
#1
0
I am afraid not. Spark steaming is based on Time driven. You can use Flink which provide Data driven
恐怕没有。 Spark steaming基于时间驱动。您可以使用提供数据驱动的Flink
https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/programming-model.html#windows
#2
0
You could use "spark.streaming.backpressure.enabled" and "spark.streaming.backpressure.initialRate" for controlling the rate at which data is received!!!
您可以使用“spark.streaming.backpressure.enabled”和“spark.streaming.backpressure.initialRate”来控制接收数据的速率!