在Spark Streaming中没有数据执行的作业

时间:2021-09-19 00:46:34

My code:

// messages is JavaPairDStream<K, V> 
Fun01(messages)
Fun02(messages)
Fun03(messages)

Fun01, Fun02, Fun03 all have transformations, output operations (foreachRDD) .

Fun01,Fun02,Fun03都有转换,输出操作(foreachRDD)。

  1. Fun01, Fun03 both executed as expected, which prove "messages" is not null or empty.

    Fun01,Fun03都按预期执行,证明“消息”不为空或空。

  2. On Spark application UI, I found Fun02's output stage in "Spark stages", which prove "executed".

    在Spark应用程序UI上,我在“Spark阶段”中找到了Fun02的输出阶段,证明了“已执行”。

  3. The first line of Fun02 is a map function, I add log in it. I also add log for every step in Fun02, they all prove "with no data".

    Fun02的第一行是地图功能,我在其中添加了登录。我还为Fun02中的每一步添加了日志,它们都证明“没有数据”。

Does somebody know possible reasons? Thanks very much.

有人知道可能的原因吗?非常感谢。


@maasg Fun02's logic is:

@maasg Fun02的逻辑是:

msg_02 = messages.mapToPair(...)
msg_03 = msg_02.reduceByKeyAndWindow(...)
msg_04 = msg_03.mapValues(...)
msg_05 = msg_04.reduceByKeyAndWindow(...)
msg_06 = msg_05.filter(...)

msg_07 = msg_06.filter(...)
msg_07.cache()
msg_07.foreachRDD(...)

I have done test on Spark-1.1 and Spark-1.2, which is supported by my company's Spark cluster.

我已经对Spark-1.1和Spark-1.2进行了测试,该测试由我公司的Spark集群支持。

1 个解决方案

#1


It seems that this is a bug in Spark-1.1 and Spark-1.2, fixed in Spark-1.3 .

看来这是Spark-1.1和Spark-1.2中的一个错误,在Spark-1.3中修复。

I post my test result here: http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html .

我在此发布我的测试结果:http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html。

When continuously use two reduceByKeyAndWindow, depending of the window, slide value, "data lost" may appear.

当连续使用两个reduceByKeyAndWindow时,根据窗口,幻灯片值,可能会出现“数据丢失”。

I can not find the bug in Spark's issue list, so I can not get the patch.

我找不到Spark的问题列表中的错误,所以我无法获得补丁。

#1


It seems that this is a bug in Spark-1.1 and Spark-1.2, fixed in Spark-1.3 .

看来这是Spark-1.1和Spark-1.2中的一个错误,在Spark-1.3中修复。

I post my test result here: http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html .

我在此发布我的测试结果:http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html。

When continuously use two reduceByKeyAndWindow, depending of the window, slide value, "data lost" may appear.

当连续使用两个reduceByKeyAndWindow时,根据窗口,幻灯片值,可能会出现“数据丢失”。

I can not find the bug in Spark's issue list, so I can not get the patch.

我找不到Spark的问题列表中的错误,所以我无法获得补丁。