I am using spark 1.6 and came across this function reduceByKeyAndWindow which I am using to perform word count over data transmitted over a kafka topic.
我正在使用spark 1.6并遇到了这个函数reduceByKeyAndWindow,我用它来对通过kafka主题传输的数据执行字数统计。
Following is the list of alternatives reduceByKeyAndWindow is providing. As we can see, all the alternatives has similar signatures with extra parameters.
以下是reduceByKeyAndWindow提供的备选列表。我们可以看到,所有替代品都有类似的签名和额外的参数。
But when I just use reduceByKeyAndWindow with my reduce function or with my reduce function and duration, it works and doesn't give me any errors as shown below.
但是当我只使用reduceByKeyAndWindow和我的reduce函数或者我的reduce函数和持续时间时,它可以工作,并且不会给我任何错误,如下所示。
But when I use the alternative with reduce function, duration and sliding window time it starts giving me the following error, same happens with the other alternatives, as shown below.
但是当我使用具有reduce功能,持续时间和滑动窗口时间的替代方案时,它开始给我以下错误,其他替代方案也是如此,如下所示。
I am not really sure what is happening here and how can I fix the problem.
我不确定这里发生了什么,我该如何解决问题。
Any help is appreciated
任何帮助表示赞赏
1 个解决方案
#1
0
If you comment this line .words.map(x => (x, 1L))
you should be able to use the method [.reduceByWindow(_+_, Seconds(2), Seconds(2))
] from DStream
.
如果你注释这一行.words.map(x =>(x,1L))你应该可以使用DStream中的方法[.reduceByWindow(_ + _,Seconds(2),Seconds(2))]。
If you transform the words to words with count, then you should use the below method.
如果将单词转换为带计数的单词,则应使用以下方法。
reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
Please see the documentation on more details for what are those reduce function
and inverse reduce
function https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
有关减少功能和反向减少功能的详细信息,请参阅文档https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream /DStream.scala
#1
0
If you comment this line .words.map(x => (x, 1L))
you should be able to use the method [.reduceByWindow(_+_, Seconds(2), Seconds(2))
] from DStream
.
如果你注释这一行.words.map(x =>(x,1L))你应该可以使用DStream中的方法[.reduceByWindow(_ + _,Seconds(2),Seconds(2))]。
If you transform the words to words with count, then you should use the below method.
如果将单词转换为带计数的单词,则应使用以下方法。
reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
Please see the documentation on more details for what are those reduce function
and inverse reduce
function https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
有关减少功能和反向减少功能的详细信息,请参阅文档https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream /DStream.scala