谷歌访问:在给定的整数数组中找到所有连续的子序列,它们的和在给定的范围内。我们能做比O(n ^ 2)?

时间:2022-04-18 21:46:48

Given an array of Integers, and a range (low, high), find all contiguous subsequence in the array which have sum in the range.

给定一个整数数组和一个范围(低,高),在数组中找到所有的连续子序列,它们在这个范围内是和。

Is there a solution better than O(n^2)?

有一个解决方案比O(n ^ 2)?

I tried a lot but couldn't find a solution that does better than O(n^2). Please help me find a better solution or confirm that this is the best we can do.

我尝试了很多但找不到解决方案确实比O(n ^ 2)。请帮我找到一个更好的解决方案或者确认这是我们能做的最好的。

This is what I have right now, I'm assuming the range to be defined as [lo, hi].

这就是我现在所拥有的,我假设这个范围被定义为[lo, hi]。

public static int numOfCombinations(final int[] data, final int lo, final int hi, int beg, int end) {
    int count = 0, sum = data[beg];

    while (beg < data.length && end < data.length) {
       if (sum > hi) {
          break;
       } else {
          if (lo <= sum && sum <= hi) {
            System.out.println("Range found: [" + beg + ", " + end + "]");
            ++count;
          }
          ++end;
          if (end < data.length) {
             sum += data[end];
          }
       }
    }
    return count;
}

public static int numOfCombinations(final int[] data, final int lo, final int hi) {
    int count = 0;

    for (int i = 0; i < data.length; ++i) {
        count += numOfCombinations(data, lo, hi, i, i);
    }

    return count;
}

6 个解决方案

#1


15  

O(n) time solution:

O(n)时间的解决方案:

You can extend the 'two pointer' idea for the 'exact' version of the problem. We will maintain variables a and b such that all intervals on the form xs[i,a), xs[i,a+1), ..., xs[i,b-1) have a sum in the sought after range [lo, hi].

您可以将“两个指针”的概念扩展到问题的“确切”版本。我们将保持变量a和b,这样所有的区间都在xs (i,a), xs[i,a+1)中,…,xs[i,b-1)在寻找范围内有一个和[lo, hi]。

a, b = 0, 0
for i in range(n):
    while a != (n+1) and sum(xs[i:a]) < lo:
        a += 1
    while b != (n+1) and sum(xs[i:b]) <= hi:
        b += 1
    for j in range(a, b):
        print(xs[i:j])

This is actually O(n^2) because of the sum, but we can easily fix that by first calculating the prefix sums ps such that ps[i] = sum(xs[:i]). Then sum(xs[i:j]) is simply ps[j]-ps[i].

这实际上是O(n 2)因为它的和,但是我们可以很容易地解决这个问题,首先计算前缀和ps (i) = sum(xs[:i])。然后求和(xs[i:j])就是ps[j]-ps[i]。

Here is an example of running the above code on [2, 5, 1, 1, 2, 2, 3, 4, 8, 2] with [lo, hi] = [3, 6]:

这里有一个运行上述代码的例子,在[2、5、1、1、2、2、3、4、8、2]和[lo, hi] = [3, 6]:

[5]
[5, 1]
[1, 1, 2]
[1, 1, 2, 2]
[1, 2]
[1, 2, 2]
[2, 2]
[2, 3]
[3]
[4]

This runs in time O(n + t), where t is the size of the output. As some have noticed, the output can be as large as t = n^2, namely if all contiguous subsequences are matched.

这是时间O(n + t), t是输出的大小。正如一些人注意到,输出可以一样大t = n ^ 2,即如果所有连续的子序列匹配。

If we allow writing the output in a compressed format (output pairs a,b of which all subsequences are contiguous) we can get a pure O(n) time algorithm.

如果我们允许以压缩格式(输出对a,其中所有子序列都是连续的)来编写输出,我们就可以得到一个纯O(n)时间算法。

#2


8  

Starting from this problem: find all contiguous sub-sequences that sum to x. What we need is something similar.

从这个问题开始:找到所有与x相加的连续子序列,我们需要的是类似的东西。

For every index i, we can calculate the sum of the segment from 0 to i, which is x. So, the problem now is we need to find from 0 to i - 1, how many segments have sum from (x - low) to (x - high), and it should be faster than O(n). So there are several data structures help you to do that in O(logn), which are Fenwick tree and Interval tree.

我的每个索引,我们可以计算的总和段从0到我,这是x。所以,现在的问题是我们需要找到我从0到- 1,有多少段从(x -低)到(x -高),它应该比O(n)。有几个数据结构可以帮助你在O(logn)中实现,这是Fenwick树和区间树。

So what we need to do is:

所以我们需要做的是:

  • Iterating through all index from 0 to n (n is the size of the array).

    遍历所有索引从0到n (n是数组的大小)。

  • At index ith, calculate, starting from 0 to ith index, the sum x, query the tree to get the total occurrences of numbers fall in the range (x - high, x - low).

    在索引ith中,计算,从0到第i个索引,和x,查询树,以得到在范围(x - high, x - low)的总次数。

  • Add x to the tree.

    把x加到树里。

So the time complexity will be O(n log n)

所以时间复杂度是O(n log n)

#3


5  

You should use a simple dynamic programming and binary search. To find the count:

您应该使用简单的动态编程和二分查找。找到数:

    from bisect import bisect_left, bisect_right

    def solve(A, start, end):
        """
        O(n lg n) Binary Search
        Bound:
        f[i] - f[j] = start
        f[i] - f[j'] = end
        start < end
        f[j] > f[j']

        :param A: an integer array
        :param start: lower bound
        :param end: upper bound 
        :return:
        """
        n = len(A)
        cnt = 0
        f = [0 for _ in xrange(n+1)]

        for i in xrange(1, n+1):
            f[i] = f[i-1]+A[i-1]  # sum from left

        f.sort()
        for i in xrange(n+1):
            lo = bisect_left(f, f[i]-end, 0, i)
            hi = bisect_right(f, f[i]-start, 0, i)
            cnt += hi-lo

        return cnt

https://github.com/algorhythms/LintCode/blob/master/Subarray%20Sum%20II.py

https://github.com/algorhythms/LintCode/blob/master/Subarray%20Sum%20II.py

To find the results rather the count, you just need another hash table to store the mapping from original (not sorted) f[i] -> list of indexes.

为了查找结果,您只需要另一个哈希表来存储来自原始(未排序)f[i] ->索引列表的映射。

Cheers.

欢呼。

#4


0  

Here is way you can get O(nlogn) if there are only positive numbers :-

如果只有正数,你可以得到O(nlogn):-。

1. Evaluate cumulative sum of array
2. for i  find total sum[j] in (sum[i]+low,sum[i]+high) using binary search
3. Total = Total + count
4. do 3 to 5 for all i

Time complexity:-

时间复杂度:-

Cumulative sum is O(N)
Finding sums in range is O(logN) using binary search
Total Time complexity is O(NlogN)

#5


0  

If all integers are non-negative, then it can be done in O(max(size-of-input,size-of-output)) time. This is optimal.

如果所有的整数都是非负的,那么它可以在O(最大值,输入,大小,输出)时间内完成。这是最优的。

Here's the algorithm in C.

这是C中的算法。

void interview_question (int* a, int N, int lo, int hi)
{
  int sum_bottom_low = 0, sum_bottom_high = 0,
      bottom_low = 0, bottom_high = 0,
      top = 0;
  int i;

  if (lo == 0) printf ("[0 0) ");
  while (top < N)
  {
    sum_bottom_low += a[top];
    sum_bottom_high += a[top];
    top++;
    while (sum_bottom_high >= lo && bottom_high <= top)
    {
      sum_bottom_high -= a[bottom_high++];
    }
    while (sum_bottom_low > hi && bottom_low <= bottom_high)
    {
      sum_bottom_low -= a[bottom_low++];
    }
    // print output
    for (i = bottom_low; i < bottom_high; ++i)
      printf ("[%d %d) ", i, top);
  }
  printf("\n");
}

Except for the last loop marked "print output", each operation is executed O(N) times; the last loop is executed once for each interval printed. If we only need to count the intervals and not print them, the entire algorithm becomes O(N).

除最后一个标记为“打印输出”的循环外,每个操作执行O(N)次;最后一个循环在每次打印时执行一次。如果我们只需要计算间隔而不是打印它们,那么整个算法就变成O(N)了。

If negative numbers are allowed, then O(N^2) is hard to beat (might be impossible).

如果允许负数,那么O(N ^ 2)很难击败(几乎不可能)。

#6


0  

yes in my opinion it can be in O(n)

struct subsequence
{
int first,last,sum;
}s;

function(array,low,high)
{
int till_max=0;
s.first=0;s.last=0;s.sum=0;
for(i=low;i<high;i++)
{

if(till_max+array[i]>array[i])
{
s.first=s.first;
s.last=i;
till_max+=array[i];
}
else
{
s.first=i;
s.last=i;
till_max=array[i];
}
if(till_max in range)
{
s.sum=till_max;
   printf("print values between first=%d and last=%d and sum=%d",s.first,s.last,s.sum);
}
}
}

#1


15  

O(n) time solution:

O(n)时间的解决方案:

You can extend the 'two pointer' idea for the 'exact' version of the problem. We will maintain variables a and b such that all intervals on the form xs[i,a), xs[i,a+1), ..., xs[i,b-1) have a sum in the sought after range [lo, hi].

您可以将“两个指针”的概念扩展到问题的“确切”版本。我们将保持变量a和b,这样所有的区间都在xs (i,a), xs[i,a+1)中,…,xs[i,b-1)在寻找范围内有一个和[lo, hi]。

a, b = 0, 0
for i in range(n):
    while a != (n+1) and sum(xs[i:a]) < lo:
        a += 1
    while b != (n+1) and sum(xs[i:b]) <= hi:
        b += 1
    for j in range(a, b):
        print(xs[i:j])

This is actually O(n^2) because of the sum, but we can easily fix that by first calculating the prefix sums ps such that ps[i] = sum(xs[:i]). Then sum(xs[i:j]) is simply ps[j]-ps[i].

这实际上是O(n 2)因为它的和,但是我们可以很容易地解决这个问题,首先计算前缀和ps (i) = sum(xs[:i])。然后求和(xs[i:j])就是ps[j]-ps[i]。

Here is an example of running the above code on [2, 5, 1, 1, 2, 2, 3, 4, 8, 2] with [lo, hi] = [3, 6]:

这里有一个运行上述代码的例子,在[2、5、1、1、2、2、3、4、8、2]和[lo, hi] = [3, 6]:

[5]
[5, 1]
[1, 1, 2]
[1, 1, 2, 2]
[1, 2]
[1, 2, 2]
[2, 2]
[2, 3]
[3]
[4]

This runs in time O(n + t), where t is the size of the output. As some have noticed, the output can be as large as t = n^2, namely if all contiguous subsequences are matched.

这是时间O(n + t), t是输出的大小。正如一些人注意到,输出可以一样大t = n ^ 2,即如果所有连续的子序列匹配。

If we allow writing the output in a compressed format (output pairs a,b of which all subsequences are contiguous) we can get a pure O(n) time algorithm.

如果我们允许以压缩格式(输出对a,其中所有子序列都是连续的)来编写输出,我们就可以得到一个纯O(n)时间算法。

#2


8  

Starting from this problem: find all contiguous sub-sequences that sum to x. What we need is something similar.

从这个问题开始:找到所有与x相加的连续子序列,我们需要的是类似的东西。

For every index i, we can calculate the sum of the segment from 0 to i, which is x. So, the problem now is we need to find from 0 to i - 1, how many segments have sum from (x - low) to (x - high), and it should be faster than O(n). So there are several data structures help you to do that in O(logn), which are Fenwick tree and Interval tree.

我的每个索引,我们可以计算的总和段从0到我,这是x。所以,现在的问题是我们需要找到我从0到- 1,有多少段从(x -低)到(x -高),它应该比O(n)。有几个数据结构可以帮助你在O(logn)中实现,这是Fenwick树和区间树。

So what we need to do is:

所以我们需要做的是:

  • Iterating through all index from 0 to n (n is the size of the array).

    遍历所有索引从0到n (n是数组的大小)。

  • At index ith, calculate, starting from 0 to ith index, the sum x, query the tree to get the total occurrences of numbers fall in the range (x - high, x - low).

    在索引ith中,计算,从0到第i个索引,和x,查询树,以得到在范围(x - high, x - low)的总次数。

  • Add x to the tree.

    把x加到树里。

So the time complexity will be O(n log n)

所以时间复杂度是O(n log n)

#3


5  

You should use a simple dynamic programming and binary search. To find the count:

您应该使用简单的动态编程和二分查找。找到数:

    from bisect import bisect_left, bisect_right

    def solve(A, start, end):
        """
        O(n lg n) Binary Search
        Bound:
        f[i] - f[j] = start
        f[i] - f[j'] = end
        start < end
        f[j] > f[j']

        :param A: an integer array
        :param start: lower bound
        :param end: upper bound 
        :return:
        """
        n = len(A)
        cnt = 0
        f = [0 for _ in xrange(n+1)]

        for i in xrange(1, n+1):
            f[i] = f[i-1]+A[i-1]  # sum from left

        f.sort()
        for i in xrange(n+1):
            lo = bisect_left(f, f[i]-end, 0, i)
            hi = bisect_right(f, f[i]-start, 0, i)
            cnt += hi-lo

        return cnt

https://github.com/algorhythms/LintCode/blob/master/Subarray%20Sum%20II.py

https://github.com/algorhythms/LintCode/blob/master/Subarray%20Sum%20II.py

To find the results rather the count, you just need another hash table to store the mapping from original (not sorted) f[i] -> list of indexes.

为了查找结果,您只需要另一个哈希表来存储来自原始(未排序)f[i] ->索引列表的映射。

Cheers.

欢呼。

#4


0  

Here is way you can get O(nlogn) if there are only positive numbers :-

如果只有正数,你可以得到O(nlogn):-。

1. Evaluate cumulative sum of array
2. for i  find total sum[j] in (sum[i]+low,sum[i]+high) using binary search
3. Total = Total + count
4. do 3 to 5 for all i

Time complexity:-

时间复杂度:-

Cumulative sum is O(N)
Finding sums in range is O(logN) using binary search
Total Time complexity is O(NlogN)

#5


0  

If all integers are non-negative, then it can be done in O(max(size-of-input,size-of-output)) time. This is optimal.

如果所有的整数都是非负的,那么它可以在O(最大值,输入,大小,输出)时间内完成。这是最优的。

Here's the algorithm in C.

这是C中的算法。

void interview_question (int* a, int N, int lo, int hi)
{
  int sum_bottom_low = 0, sum_bottom_high = 0,
      bottom_low = 0, bottom_high = 0,
      top = 0;
  int i;

  if (lo == 0) printf ("[0 0) ");
  while (top < N)
  {
    sum_bottom_low += a[top];
    sum_bottom_high += a[top];
    top++;
    while (sum_bottom_high >= lo && bottom_high <= top)
    {
      sum_bottom_high -= a[bottom_high++];
    }
    while (sum_bottom_low > hi && bottom_low <= bottom_high)
    {
      sum_bottom_low -= a[bottom_low++];
    }
    // print output
    for (i = bottom_low; i < bottom_high; ++i)
      printf ("[%d %d) ", i, top);
  }
  printf("\n");
}

Except for the last loop marked "print output", each operation is executed O(N) times; the last loop is executed once for each interval printed. If we only need to count the intervals and not print them, the entire algorithm becomes O(N).

除最后一个标记为“打印输出”的循环外,每个操作执行O(N)次;最后一个循环在每次打印时执行一次。如果我们只需要计算间隔而不是打印它们,那么整个算法就变成O(N)了。

If negative numbers are allowed, then O(N^2) is hard to beat (might be impossible).

如果允许负数,那么O(N ^ 2)很难击败(几乎不可能)。

#6


0  

yes in my opinion it can be in O(n)

struct subsequence
{
int first,last,sum;
}s;

function(array,low,high)
{
int till_max=0;
s.first=0;s.last=0;s.sum=0;
for(i=low;i<high;i++)
{

if(till_max+array[i]>array[i])
{
s.first=s.first;
s.last=i;
till_max+=array[i];
}
else
{
s.first=i;
s.last=i;
till_max=array[i];
}
if(till_max in range)
{
s.sum=till_max;
   printf("print values between first=%d and last=%d and sum=%d",s.first,s.last,s.sum);
}
}
}