在O(log n)时间内查找连续整数数组中的重复元素

时间:2021-05-19 15:57:17

A couple of days ago, in an interview I was asked for a program that would find the single duplicate element in a consecutive integer array in O(log n) time.

几天前,在一次采访中,我被要求提供一个程序,该程序将在O(log n)时间内在连续的整数数组中找到单个重复元素。

The case is somewhat specific, as there are a total of 11 integers (1 to 10, in that order) plus, a single duplicate of any of these numbers, inserted somewhere in between.

这种情况有点具体,因为总共有11个整数(1到10,按此顺序)加上,这些数字中的任何一个的单个副本插入其间的某个位置。

I was given a sample similar to this:

我得到了一个类似的样本:

{1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}

{1,2,3,4,5,6,2,7,8,9,10}

So, now I've come up with the following C code:

所以,现在我想出了以下C代码:

#include <stdio.h>

int main (void)
{
    int a[11] = {1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}, first=0, last=10, mid;

    while (1)
    {
        mid = (first + last)/2;

        if (mid+1 == a[mid])
            first = mid+1;

        else if ((mid == 0) || (mid == 10) || (a[mid+1] - a[mid-1] == 1))   /* Order is Important, (a[mid+1] - a[mid-1] == 1) must be last */
            break;

        else
            last = mid-1;
    }

    printf("Duplicate element is %d, located at position no. %d.", a[mid], mid+1);

    return 0;
}

Would this properly satisfy the O(log n) criteria? And, are there any alternatives/improvements over this?

这是否恰当地满足了O(log n)标准?而且,有没有替代/改进?

2 个解决方案

#1


3  

Yes, it has O(log n) time complexity.

是的,它具有O(log n)时间复杂度。

It is possible to make the code more clear using the following fact: you need to find the smallest i such that a[i] != i + 1, so it can be implemented in a more concise way:

使用以下事实可以使代码更清晰:您需要找到最小的i,使得a [i]!= i + 1,因此可以更简洁的方式实现:

//invariant: the [0...low] prefix does not contain a duplicate
//           the [0...high] prefix contains a duplicate
low = 0 //[0...0] prefix obviously does not contain a duplicate
high = 10 //the entire array obviously contains a duplicate
while high - low > 1:
    mid = (low + high) / 2 
    if a[mid] != mid + 1:
        high = mid
    else:
        low = mid
print(a[high], high)

#2


1  

We can modify the binary search algorithm to get the solution. In binary search, we have a key and we used this key to find its position by bisecting the array size. Here, we don't have a key, instead we have to find it. But the behavior of duplicate element can be used to bisect the array size. How ? lets see:

我们可以修改二进制搜索算法来获得解决方案。在二进制搜索中,我们有一个键,我们使用此键通过平分数组大小来找到它的位置。在这里,我们没有钥匙,相反,我们必须找到它。但是重复元素的行为可以用于平分数组大小。怎么样 ?让我们来看看:

On carefully looking the data, we can easily see that after inserting the duplicate element at random position (say index) in consecutive element array the property of elements will change (a[i] == i+1 --> a[i] != i+1) from the position index (including the index). Now this changed property can be used to bisect the array size. Hence, we can find out the duplicate in O(log(n)).

仔细查看数据后,我们可以很容易地看到,在连续元素数组中的随机位置(比如索引)中插入重复元素后,元素的属性将发生变化(a [i] == i + 1 - > a [i] != i + 1)来自位置索引(包括索引)。现在,此更改的属性可用于平分数组大小。因此,我们可以在O(log(n))中找出副本。

For example, Consider your given array: {1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}

例如,考虑您给定的数组:{1,2,3,4,5,6,2,7,8,9,10}

{1,  2,  3,   4,  5,  6,  2,  7,  8,  9,  10}
                          ||
                          from this position the the property of (a[i] == i+1) will no more satisfied.

This property can be model to bisect the array size in solution.

此属性可以是模型,以在解决方案中平分数组大小。

void  binary_duplictae_finder(int a[], int low, int high) {

   int mid=(low+high)/2;

   if(high - low > 1){
          if(a[mid]!=mid+1)
              binary_duplictae_finder(a, low, mid);
          else
              binary_duplictae_finder(a, mid, high);
   }

   if(high==low+1)
      printf("%d ", a[high]);
 }

#1


3  

Yes, it has O(log n) time complexity.

是的,它具有O(log n)时间复杂度。

It is possible to make the code more clear using the following fact: you need to find the smallest i such that a[i] != i + 1, so it can be implemented in a more concise way:

使用以下事实可以使代码更清晰:您需要找到最小的i,使得a [i]!= i + 1,因此可以更简洁的方式实现:

//invariant: the [0...low] prefix does not contain a duplicate
//           the [0...high] prefix contains a duplicate
low = 0 //[0...0] prefix obviously does not contain a duplicate
high = 10 //the entire array obviously contains a duplicate
while high - low > 1:
    mid = (low + high) / 2 
    if a[mid] != mid + 1:
        high = mid
    else:
        low = mid
print(a[high], high)

#2


1  

We can modify the binary search algorithm to get the solution. In binary search, we have a key and we used this key to find its position by bisecting the array size. Here, we don't have a key, instead we have to find it. But the behavior of duplicate element can be used to bisect the array size. How ? lets see:

我们可以修改二进制搜索算法来获得解决方案。在二进制搜索中,我们有一个键,我们使用此键通过平分数组大小来找到它的位置。在这里,我们没有钥匙,相反,我们必须找到它。但是重复元素的行为可以用于平分数组大小。怎么样 ?让我们来看看:

On carefully looking the data, we can easily see that after inserting the duplicate element at random position (say index) in consecutive element array the property of elements will change (a[i] == i+1 --> a[i] != i+1) from the position index (including the index). Now this changed property can be used to bisect the array size. Hence, we can find out the duplicate in O(log(n)).

仔细查看数据后,我们可以很容易地看到,在连续元素数组中的随机位置(比如索引)中插入重复元素后,元素的属性将发生变化(a [i] == i + 1 - > a [i] != i + 1)来自位置索引(包括索引)。现在,此更改的属性可用于平分数组大小。因此,我们可以在O(log(n))中找出副本。

For example, Consider your given array: {1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}

例如,考虑您给定的数组:{1,2,3,4,5,6,2,7,8,9,10}

{1,  2,  3,   4,  5,  6,  2,  7,  8,  9,  10}
                          ||
                          from this position the the property of (a[i] == i+1) will no more satisfied.

This property can be model to bisect the array size in solution.

此属性可以是模型,以在解决方案中平分数组大小。

void  binary_duplictae_finder(int a[], int low, int high) {

   int mid=(low+high)/2;

   if(high - low > 1){
          if(a[mid]!=mid+1)
              binary_duplictae_finder(a, low, mid);
          else
              binary_duplictae_finder(a, mid, high);
   }

   if(high==low+1)
      printf("%d ", a[high]);
 }