数据流是否会避免SlidingWindows中的重复计算?

时间:2022-09-21 15:33:41

In cloud dataflow an element may be assigned into multiple windows in the occasion of SlidingWindow which has a size and a step. Suppose that we have a large size SlidingWindow which has a very small step, in fact the elements in two adjacent window would be almost same except the sliding step.

在云数据流中,可以在SlidingWindow的情况下将元素分配到多个窗口中,其具有大小和步长。假设我们有一个大尺寸的SlidingWindow,其步长非常小,实际上除了滑动步骤之外,两个相邻窗口中的元素几乎相同。

So would computing on every SlidingWindow just simply load all the elements in this window and trigger a compute on these elements? Or the adjacent window could reuse some computing result to avoid duplicate computing? And whether the element would be copied when been assigned into multiple windows?

那么在每个SlidingWindow上计算只是简单地加载这个窗口中的所有元素并触发这些元素的计算?或者相邻窗口可以重用一些计算结果以避免重复计算?是否在分配到多个窗口时复制元素?

1 个解决方案

#1


1  

Dataflow does not have any special handling for SlidingWindows like this. The element occurs in every window to which it is assigned.

Dataflow没有像这样对SlidingWindows进行任何特殊处理。元素出现在分配给它的每个窗口中。

We typically haven't found performance problems using regular SlidingWindows with a CombineFn afterwards. We would suggest trying that first and following up with more details on what you're trying to compute and specifics on your windowing if you're having problems.

我们通常使用常规的SlidingWindows和CombineFn后发现性能问题。如果您遇到问题,我们建议您首先尝试并详细说明您尝试计算的内容以及窗口的具体细节。

Automatically doing this as an optimization doesn't work well in the presence of user-defined windowing, triggering, out-of-order data, and other optimizations already present in the system.

在存在用户定义的窗口化,触发,无序数据和系统中已存在的其他优化时,自动执行此操作作为优化不能很好地工作。

#1


1  

Dataflow does not have any special handling for SlidingWindows like this. The element occurs in every window to which it is assigned.

Dataflow没有像这样对SlidingWindows进行任何特殊处理。元素出现在分配给它的每个窗口中。

We typically haven't found performance problems using regular SlidingWindows with a CombineFn afterwards. We would suggest trying that first and following up with more details on what you're trying to compute and specifics on your windowing if you're having problems.

我们通常使用常规的SlidingWindows和CombineFn后发现性能问题。如果您遇到问题,我们建议您首先尝试并详细说明您尝试计算的内容以及窗口的具体细节。

Automatically doing this as an optimization doesn't work well in the presence of user-defined windowing, triggering, out-of-order data, and other optimizations already present in the system.

在存在用户定义的窗口化,触发,无序数据和系统中已存在的其他优化时,自动执行此操作作为优化不能很好地工作。