Spark中reduceByKeyAndWindow函数的奇怪行为

I am using spark 1.6 and came across this function reduceByKeyAndWindow which I am using to perform word count over data transmitted over a kafka topic.

我正在使用spark 1.6并遇到了这个函数reduceByKeyAndWindow,我用它来对通过kafka主题传输的数据执行字数统计。

Following is the list of alternatives reduceByKeyAndWindow is providing. As we can see, all the alternatives has similar signatures with extra parameters.

以下是reduceByKeyAndWindow提供的备选列表。我们可以看到,所有替代品都有类似的签名和额外的参数。

But when I just use reduceByKeyAndWindow with my reduce function or with my reduce function and duration, it works and doesn't give me any errors as shown below.

但是当我只使用reduceByKeyAndWindow和我的reduce函数或者我的reduce函数和持续时间时,它可以工作,并且不会给我任何错误,如下所示。

But when I use the alternative with reduce function, duration and sliding window time it starts giving me the following error, same happens with the other alternatives, as shown below.

但是当我使用具有reduce功能,持续时间和滑动窗口时间的替代方案时,它开始给我以下错误,其他替代方案也是如此,如下所示。

I am not really sure what is happening here and how can I fix the problem.

我不确定这里发生了什么,我该如何解决问题。

Any help is appreciated

任何帮助表示赞赏

1 个解决方案

#1

If you comment this line .words.map(x => (x, 1L)) you should be able to use the method [.reduceByWindow(_+_, Seconds(2), Seconds(2))] from DStream.

如果你注释这一行.words.map(x =>(x,1L))你应该可以使用DStream中的方法[.reduceByWindow(_ + _,Seconds(2),Seconds(2))]。

If you transform the words to words with count, then you should use the below method.

如果将单词转换为带计数的单词,则应使用以下方法。

reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)

Please see the documentation on more details for what are those reduce function and inverse reduce function https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala

有关减少功能和反向减少功能的详细信息,请参阅文档https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream /DStream.scala

#1