My code:
// messages is JavaPairDStream<K, V>
Fun01(messages)
Fun02(messages)
Fun03(messages)
Fun01, Fun02, Fun03 all have transformations, output operations (foreachRDD) .
Fun01,Fun02,Fun03都有转换,输出操作(foreachRDD)。
-
Fun01, Fun03 both executed as expected, which prove "messages" is not null or empty.
Fun01,Fun03都按预期执行,证明“消息”不为空或空。
-
On Spark application UI, I found Fun02's output stage in "Spark stages", which prove "executed".
在Spark应用程序UI上,我在“Spark阶段”中找到了Fun02的输出阶段,证明了“已执行”。
-
The first line of Fun02 is a map function, I add log in it. I also add log for every step in Fun02, they all prove "with no data".
Fun02的第一行是地图功能,我在其中添加了登录。我还为Fun02中的每一步添加了日志,它们都证明“没有数据”。
Does somebody know possible reasons? Thanks very much.
有人知道可能的原因吗?非常感谢。
@maasg Fun02's logic is:
@maasg Fun02的逻辑是:
msg_02 = messages.mapToPair(...)
msg_03 = msg_02.reduceByKeyAndWindow(...)
msg_04 = msg_03.mapValues(...)
msg_05 = msg_04.reduceByKeyAndWindow(...)
msg_06 = msg_05.filter(...)
msg_07 = msg_06.filter(...)
msg_07.cache()
msg_07.foreachRDD(...)
I have done test on Spark-1.1 and Spark-1.2, which is supported by my company's Spark cluster.
我已经对Spark-1.1和Spark-1.2进行了测试,该测试由我公司的Spark集群支持。
1 个解决方案
#1
It seems that this is a bug in Spark-1.1 and Spark-1.2, fixed in Spark-1.3 .
看来这是Spark-1.1和Spark-1.2中的一个错误,在Spark-1.3中修复。
I post my test result here: http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html .
我在此发布我的测试结果:http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html。
When continuously use two reduceByKeyAndWindow, depending of the window, slide value, "data lost" may appear.
当连续使用两个reduceByKeyAndWindow时,根据窗口,幻灯片值,可能会出现“数据丢失”。
I can not find the bug in Spark's issue list, so I can not get the patch.
我找不到Spark的问题列表中的错误,所以我无法获得补丁。
#1
It seems that this is a bug in Spark-1.1 and Spark-1.2, fixed in Spark-1.3 .
看来这是Spark-1.1和Spark-1.2中的一个错误,在Spark-1.3中修复。
I post my test result here: http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html .
我在此发布我的测试结果:http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html。
When continuously use two reduceByKeyAndWindow, depending of the window, slide value, "data lost" may appear.
当连续使用两个reduceByKeyAndWindow时,根据窗口,幻灯片值,可能会出现“数据丢失”。
I can not find the bug in Spark's issue list, so I can not get the patch.
我找不到Spark的问题列表中的错误,所以我无法获得补丁。