I am using Apache Spark streaming to do some real-time processing of my web service API logs. The source stream is just a series of API calls with return code. And my Spark app is mainly doing aggregation over the raw API call logs, counting how many API returning certain HTTP code.
我正在使用Apache Spark流对我的web服务API日志进行一些实时处理。源流只是一系列带有返回代码的API调用。我的Spark应用程序主要在原始API调用日志上进行聚合,计算返回特定HTTP代码的API数量。
The batch interval on the source stream is 1 seconds. Then I do :
源流上的批处理间隔为1秒。然后我做:
inputStream.reduceByKey(_ + _) where inputStream is of type DStream[(String, Int)].
And now I get the result DStream level1
. Then I do reduceByKeyAndWindow
on level1
over 60 seconds by calling
现在我得到了结果DStream level1。然后我在1级上调用60秒的reduceByKeyAndWindow
val level2 = level1.reduceByKeyAndWindow((a: Int, b: Int) => a + b, Seconds(60), Seconds(60))
Then I want to do further aggregation (say level 3
) over longer period (say 3600 seconds) on top of DStream level2
by calling
然后我想通过调用在DStream level2之上进行更长的时间(比如3600秒)的进一步聚合(比如级别3)
val level3 = level2.reduceByKeyAndWindow((a: Int, b: Int) => a + b, Seconds(3600), Seconds(3600))
My problem now is: I only get aggregated data on level2
, while level3
is empty.
我现在的问题是:我只在第2级获得聚合数据,而第3级为空。
My understanding is that level3
should not be empty and it should aggregate over level 2
stream.
我的理解是第3级不应该是空的,它应该在第2级流上聚合。
Of course I can change to let level3
aggregate over level1
, instead of level2
. But I don't understand why it is not working by aggregating over level2
.
当然,我可以更改为让level3聚合超过level1,而不是level2。但是我不明白为什么它不能通过聚集2级而起作用。
It seems to me that you can only do one layer of reduceByKeyAndWindow
on the source stream. Any further layers of reduceByKeyAndWindow
on top of previous streams reduced by key and window won't work.
在我看来,您只能在源流上做一层reduceByKeyAndWindow。在以前的流之上的任何由key和window减少的reduceByKeyAndWindow的其他层都不能工作。
Any ideas?
什么好主意吗?
1 个解决方案
#1
1
Yes, I think it should be a bug in Spark Streaming. Seems the Window operation of windowed stream does not work. Now I'm also investigating the reason. Will keep updated for any findings.
是的,我认为它应该是Spark流中的一个bug。窗口的窗口操作似乎不起作用。现在我也在调查原因。将随时更新任何发现。
Similar Question: indows of windowed streams not displaying the expected results
类似的问题:未显示预期结果的窗口流的缩进
#1
1
Yes, I think it should be a bug in Spark Streaming. Seems the Window operation of windowed stream does not work. Now I'm also investigating the reason. Will keep updated for any findings.
是的,我认为它应该是Spark流中的一个bug。窗口的窗口操作似乎不起作用。现在我也在调查原因。将随时更新任何发现。
Similar Question: indows of windowed streams not displaying the expected results
类似的问题:未显示预期结果的窗口流的缩进