移动窗口的最小/最大值能在O(N)中实现吗?

时间:2021-03-31 22:53:55

I have input array A

我有输入数组A

 A[0], A[1], ... , A[N-1]

I want function Max(T,A) which return B represent max value on A over previous moving window of size T where

我想要函数Max(T,A)它返回B,表示A上的最大值,比之前的移动窗口T大

 B[i+T] = Max(A[i], A[i+T])

By using max heap to keep track of max value on current moving windows A[i] to A[i+T], this algorithm yields O(N log(T)) worst case.

通过使用max heap跟踪当前移动windows A[i]到A[i+T]上的最大值,该算法得到O(N log(T)最坏情况。

I would like to know is there any better algorithm? Maybe an O(N) algorithm

我想知道有没有更好的算法?也许是一个O(N)的算法

3 个解决方案

#1


21  

O(N) is possible using Deque data structure. It holds pairs (Value; Index).

O(N)可以使用Deque数据结构。它拥有双(价值;索引)。

at every step:

  if (!Deque.Empty) and (Deque.Head.Index <= CurrentIndex - T) then 
     Deque.ExtractHead;
  //Head is too old, it is leaving the window

  while (!Deque.Empty) and (Deque.Tail.Value > CurrentValue) do
     Deque.ExtractTail;
  //remove elements that have no chance to become minimum in the window

  Deque.AddTail(CurrentValue, CurrentIndex); 
  CurrentMin = Deque.Head.Value
  //Head value is minimum in the current window

#2


6  

it's called RMQ(range minimum query). Actually i once wrote an article about that(with c++ code). See http://attiix.com/2011/08/22/4-ways-to-solve-%C2%B11-rmq/

它被称为RMQ(范围最小查询)。实际上,我曾经写过一篇关于这方面的文章(使用c++代码)。参见http://attiix.com/2011/08/22/4-ways-to-solve-%C2%B11-rmq/

or you may prefer the wikipedia, Range Minimum Query

或者你可能更喜欢wikipedia, Range Minimum查询

after the preparation, you can get the max number of any given range in O(1)

在准备之后,您可以获得O(1)中任意给定范围的最大值

#3


1  

There is a sub-field in image processing called Mathematical Morphology. The operation you are implementing is a core concept in this field, called dilation. Obviously, this operation has been studied extensively and we know how to implement it very efficiently.

在图像处理中有一个子域叫做数学形态学。您正在实现的操作是这个领域的一个核心概念,称为膨胀。显然,这一操作已经被广泛研究,我们知道如何高效地实现它。

The most efficient algorithm for this problem was proposed in 1992 and 1993, independently by van Herk, and Gil and Werman. This algorithm needs exactly 3 comparisons per sample, independently of the size of T.

这个问题最有效的算法是1992年和1993年分别由van Herk、Gil和Werman提出的。这个算法需要每个样本进行3次比较,与T的大小无关。

Some years later, Gil and Kimmel further refined the algorithm to need only 2.5 comparisons per sample. Though the increased complexity of the method might offset the fewer comparisons (I find that more complex code runs more slowly). I have never implemented this variant.

几年后,Gil和Kimmel进一步改进了算法,每个样本只需要2.5个比较。虽然方法的复杂性增加可能会抵消比较的减少(我发现更复杂的代码运行得更慢)。我从未实现过这种变体。

The HGW algorithm, as it's called, needs two intermediate buffers of the same size as the input. For ridiculously large inputs (billions of samples), you could split up the data into chunks and process it chunk-wise.

HGW算法,顾名思义,需要两个与输入大小相同的中间缓冲区。对于非常大的输入(数十亿个样本),您可以将数据分成块,并将其处理成块。

In sort, you walk through the data forward, computing the cumulative max over chunks of size T. You do the same walking backward. Each of these require one comparison per sample. Finally, the result is the maximum over one value in each of these two temporary arrays. For data locality, you can do the two passes over the input at the same time.

在排序中,您向前遍历数据,计算大小为t的块的累计最大值,然后向后遍历数据。每个样本都需要一个比较。最后,结果是这两个临时数组中每个值的最大值。对于数据局部性,您可以同时对输入进行两次传递。

I guess you could even do a running version, where the temporary arrays are of length 2*T, but that would be more complex to implement.

我猜你甚至可以做一个正在运行的版本,其中临时数组的长度是2*T,但是实现起来会更复杂。

  • van Herk, "A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels", Pattern Recognition Letters 13(7):517-521, 1992 (doi)

    范荷克,“矩形和八角形核上局部最小和最大滤波器的快速算法”,模式识别字母13(7):517-521,1992 (doi)

  • Gil, Werman, "Computing 2-D min, median, and max filters", IEEE Transactions on Pattern Analysis and Machine Intelligence 15(5):504-507 , 1993 (doi)

    “计算二维最小值、中值和最大滤波器”,IEEE模式分析与机器智能15(5):504-507,1993 (doi)

  • Gil, Kimmel, "Efficient dilation, erosion, opening, and closing algorithms", IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12):1606-1617, 2002 (doi)

    《有效扩张、侵蚀、开放和关闭算法》,IEEE模式分析与机器智能24(12):1606-1617,2002 (doi)

(Note: cross-posted from this related question on Code Review.)

(注:与此相关的代码审查问题交叉张贴。)

#1


21  

O(N) is possible using Deque data structure. It holds pairs (Value; Index).

O(N)可以使用Deque数据结构。它拥有双(价值;索引)。

at every step:

  if (!Deque.Empty) and (Deque.Head.Index <= CurrentIndex - T) then 
     Deque.ExtractHead;
  //Head is too old, it is leaving the window

  while (!Deque.Empty) and (Deque.Tail.Value > CurrentValue) do
     Deque.ExtractTail;
  //remove elements that have no chance to become minimum in the window

  Deque.AddTail(CurrentValue, CurrentIndex); 
  CurrentMin = Deque.Head.Value
  //Head value is minimum in the current window

#2


6  

it's called RMQ(range minimum query). Actually i once wrote an article about that(with c++ code). See http://attiix.com/2011/08/22/4-ways-to-solve-%C2%B11-rmq/

它被称为RMQ(范围最小查询)。实际上,我曾经写过一篇关于这方面的文章(使用c++代码)。参见http://attiix.com/2011/08/22/4-ways-to-solve-%C2%B11-rmq/

or you may prefer the wikipedia, Range Minimum Query

或者你可能更喜欢wikipedia, Range Minimum查询

after the preparation, you can get the max number of any given range in O(1)

在准备之后,您可以获得O(1)中任意给定范围的最大值

#3


1  

There is a sub-field in image processing called Mathematical Morphology. The operation you are implementing is a core concept in this field, called dilation. Obviously, this operation has been studied extensively and we know how to implement it very efficiently.

在图像处理中有一个子域叫做数学形态学。您正在实现的操作是这个领域的一个核心概念,称为膨胀。显然,这一操作已经被广泛研究,我们知道如何高效地实现它。

The most efficient algorithm for this problem was proposed in 1992 and 1993, independently by van Herk, and Gil and Werman. This algorithm needs exactly 3 comparisons per sample, independently of the size of T.

这个问题最有效的算法是1992年和1993年分别由van Herk、Gil和Werman提出的。这个算法需要每个样本进行3次比较,与T的大小无关。

Some years later, Gil and Kimmel further refined the algorithm to need only 2.5 comparisons per sample. Though the increased complexity of the method might offset the fewer comparisons (I find that more complex code runs more slowly). I have never implemented this variant.

几年后,Gil和Kimmel进一步改进了算法,每个样本只需要2.5个比较。虽然方法的复杂性增加可能会抵消比较的减少(我发现更复杂的代码运行得更慢)。我从未实现过这种变体。

The HGW algorithm, as it's called, needs two intermediate buffers of the same size as the input. For ridiculously large inputs (billions of samples), you could split up the data into chunks and process it chunk-wise.

HGW算法,顾名思义,需要两个与输入大小相同的中间缓冲区。对于非常大的输入(数十亿个样本),您可以将数据分成块,并将其处理成块。

In sort, you walk through the data forward, computing the cumulative max over chunks of size T. You do the same walking backward. Each of these require one comparison per sample. Finally, the result is the maximum over one value in each of these two temporary arrays. For data locality, you can do the two passes over the input at the same time.

在排序中,您向前遍历数据,计算大小为t的块的累计最大值,然后向后遍历数据。每个样本都需要一个比较。最后,结果是这两个临时数组中每个值的最大值。对于数据局部性,您可以同时对输入进行两次传递。

I guess you could even do a running version, where the temporary arrays are of length 2*T, but that would be more complex to implement.

我猜你甚至可以做一个正在运行的版本,其中临时数组的长度是2*T,但是实现起来会更复杂。

  • van Herk, "A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels", Pattern Recognition Letters 13(7):517-521, 1992 (doi)

    范荷克,“矩形和八角形核上局部最小和最大滤波器的快速算法”,模式识别字母13(7):517-521,1992 (doi)

  • Gil, Werman, "Computing 2-D min, median, and max filters", IEEE Transactions on Pattern Analysis and Machine Intelligence 15(5):504-507 , 1993 (doi)

    “计算二维最小值、中值和最大滤波器”,IEEE模式分析与机器智能15(5):504-507,1993 (doi)

  • Gil, Kimmel, "Efficient dilation, erosion, opening, and closing algorithms", IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12):1606-1617, 2002 (doi)

    《有效扩张、侵蚀、开放和关闭算法》,IEEE模式分析与机器智能24(12):1606-1617,2002 (doi)

(Note: cross-posted from this related question on Code Review.)

(注:与此相关的代码审查问题交叉张贴。)