大小为n的数组,其中一个元素为n / 2次

时间:2022-07-09 21:41:58

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.

给定n个整数的数组,其中一个元素出现超过n / 2次。我们需要在线性时间和恒定的额外空间中找到该元素。

YAAQ: Yet another arrays question.

YAAQ:另一个阵列问题。

9 个解决方案

#1


I have a sneaking suspicion it's something along the lines of (in C#)

我有一种潜行的怀疑,这是(在C#中)

// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
    // Initial value is irrelevant if sequence is non-empty,
    // but keeps compiler happy.
    int best = 0; 
    int count = 0;

    foreach (int element in sequence)
    {
        if (count == 0)
        {
            best = element;
            count = 1;
        }
        else
        {
            // Vote current choice up or down
            count += (best == element) ? 1 : -1;
        }
    }
    return best;
}

It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)

听起来不太可行,但确实如此。 (证明为后记文件,由Boyer / Moore提供。)

#2


Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.

找到中位数,在未排序的数组上需要O(n)。由于超过n / 2个元素等于相同的值,因此中值也等于该值。

#3


int findLeader(int n, int* x){
    int leader = x[0], c = 1, i;
    for(i=1; i<n; i++){
        if(c == 0){
            leader = x[i];
            c = 1;
        } else {
            if(x[i] == leader) c++;
            else c--;
        }
    }

    if(c == 0) return NULL;
    else {
        c = 0;
        for(i=0; i<n; i++){
            if(x[i] == leader) c++;
        }
        if(c > n/2) return leader;
        else return NULL;
    }
}

I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.

我不是此代码的作者,但这将适用于您的问题。第一部分寻找潜在的领导者,第二部分检查它是否在阵列中出现超过n / 2次。

#4


Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.

那么你可以按照这里描述的[pdf]进行现场基数排序,这不需要额外的空间和线性时间。然后你可以进行一次计数连续元素并终止于count> n / 2。

#5


This is what I thought initially.

这是我最初的想法。

I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.

我试图保持不变量“一个元素出现超过n / 2次”,同时减少问题集。

Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions. This would maintain the invariant.

让我们开始比较[i],[i + 1]。如果它们相等,我们比较[i + i],a [i + 2]。如果没有,我们从数组中删除[i],[i + 1]。我们重复这个,直到i> =(当前大小)/ 2。在这一点上,我们将'THE'元素占据第一个(当前大小)/ 2个位置。这将保持不变性。

The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]

唯一需要注意的是,我们假设数组在链表中[因为它给出了O(n)复杂性。]

What say folks?

伙计们怎么说?

-bhupi

#6


How about: randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").

怎么样:随机选择一小部分K元素并寻找重复(例如前4,前8等)。如果K == 4那么没有得到至少2个重复的概率是1/8。如果K == 8那么它会低于1%。如果您发现没有重复项,请重复此过程,直到您执行此操作。 (假设其他元素更随机地分布,这将表现得非常糟糕,例如,49%的数组=“A”,51%的数组=“B”)。

e.g.:

findDuplicateCandidate: 
    select a fixed size subset.
    return the most common element in that subset
    if there is no element with more than 1 occurrence repeat.
    if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common    

This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.

这是一个常数阶操作(如果数据集不错),那么按顺序(N)进行数组的线性扫描以进行验证。

#7


My first thought (not sufficient) would be to:

我的第一个想法(不充分)是:

  • Sort the array in place
  • 对阵列进行排序

  • Return the middle element
  • 返回中间元素

But that would be O(n log n), as would any recursive solution.

但那将是O(n log n),就像任何递归解决方案一样。

If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?

如果您可以破坏性地修改数组(以及其他各种条件),您可以使用其计数或其他内容替换元素。你对阵列有什么了解吗?你可以修改它吗?

Edit Leaving my answer here for posterity, but I think Skeet's got it.

编辑在这里留下我的答案为后代,但我认为Skeet得到了它。

#8


in php---pls check if it's correct

在php ---请检查它是否正确

function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
    $val = $A[$i];
    if(in_array($val,$B,true)){//to avoid looping again and again
    }else{
     if($counts[$val]>$len/2){
      return $val;
     }
     array_push($B, $val);//to avoid looping again and again
    }
 }
 return -1;
}

#9


int n = A.Length;
            int[] L = new int[n + 1];
            L[0] = -1;
            for (int i = 0; i < n; i++)
            {
                L[i + 1] = A[i];
            }
            int count = 0;
            int pos = (n + 1) / 2;
            int candidate = L[pos];
            for (int i = 1; i <= n; i++)
            {
                if (L[i] == candidate && L[pos++] == candidate)
                    return candidate;
            }
            if (count > pos)
                return candidate;
            return (-1);

#1


I have a sneaking suspicion it's something along the lines of (in C#)

我有一种潜行的怀疑,这是(在C#中)

// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
    // Initial value is irrelevant if sequence is non-empty,
    // but keeps compiler happy.
    int best = 0; 
    int count = 0;

    foreach (int element in sequence)
    {
        if (count == 0)
        {
            best = element;
            count = 1;
        }
        else
        {
            // Vote current choice up or down
            count += (best == element) ? 1 : -1;
        }
    }
    return best;
}

It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)

听起来不太可行,但确实如此。 (证明为后记文件,由Boyer / Moore提供。)

#2


Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.

找到中位数,在未排序的数组上需要O(n)。由于超过n / 2个元素等于相同的值,因此中值也等于该值。

#3


int findLeader(int n, int* x){
    int leader = x[0], c = 1, i;
    for(i=1; i<n; i++){
        if(c == 0){
            leader = x[i];
            c = 1;
        } else {
            if(x[i] == leader) c++;
            else c--;
        }
    }

    if(c == 0) return NULL;
    else {
        c = 0;
        for(i=0; i<n; i++){
            if(x[i] == leader) c++;
        }
        if(c > n/2) return leader;
        else return NULL;
    }
}

I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.

我不是此代码的作者,但这将适用于您的问题。第一部分寻找潜在的领导者,第二部分检查它是否在阵列中出现超过n / 2次。

#4


Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.

那么你可以按照这里描述的[pdf]进行现场基数排序,这不需要额外的空间和线性时间。然后你可以进行一次计数连续元素并终止于count> n / 2。

#5


This is what I thought initially.

这是我最初的想法。

I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.

我试图保持不变量“一个元素出现超过n / 2次”,同时减少问题集。

Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions. This would maintain the invariant.

让我们开始比较[i],[i + 1]。如果它们相等,我们比较[i + i],a [i + 2]。如果没有,我们从数组中删除[i],[i + 1]。我们重复这个,直到i> =(当前大小)/ 2。在这一点上,我们将'THE'元素占据第一个(当前大小)/ 2个位置。这将保持不变性。

The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]

唯一需要注意的是,我们假设数组在链表中[因为它给出了O(n)复杂性。]

What say folks?

伙计们怎么说?

-bhupi

#6


How about: randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").

怎么样:随机选择一小部分K元素并寻找重复(例如前4,前8等)。如果K == 4那么没有得到至少2个重复的概率是1/8。如果K == 8那么它会低于1%。如果您发现没有重复项,请重复此过程,直到您执行此操作。 (假设其他元素更随机地分布,这将表现得非常糟糕,例如,49%的数组=“A”,51%的数组=“B”)。

e.g.:

findDuplicateCandidate: 
    select a fixed size subset.
    return the most common element in that subset
    if there is no element with more than 1 occurrence repeat.
    if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common    

This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.

这是一个常数阶操作(如果数据集不错),那么按顺序(N)进行数组的线性扫描以进行验证。

#7


My first thought (not sufficient) would be to:

我的第一个想法(不充分)是:

  • Sort the array in place
  • 对阵列进行排序

  • Return the middle element
  • 返回中间元素

But that would be O(n log n), as would any recursive solution.

但那将是O(n log n),就像任何递归解决方案一样。

If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?

如果您可以破坏性地修改数组(以及其他各种条件),您可以使用其计数或其他内容替换元素。你对阵列有什么了解吗?你可以修改它吗?

Edit Leaving my answer here for posterity, but I think Skeet's got it.

编辑在这里留下我的答案为后代,但我认为Skeet得到了它。

#8


in php---pls check if it's correct

在php ---请检查它是否正确

function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
    $val = $A[$i];
    if(in_array($val,$B,true)){//to avoid looping again and again
    }else{
     if($counts[$val]>$len/2){
      return $val;
     }
     array_push($B, $val);//to avoid looping again and again
    }
 }
 return -1;
}

#9


int n = A.Length;
            int[] L = new int[n + 1];
            L[0] = -1;
            for (int i = 0; i < n; i++)
            {
                L[i + 1] = A[i];
            }
            int count = 0;
            int pos = (n + 1) / 2;
            int candidate = L[pos];
            for (int i = 1; i <= n; i++)
            {
                if (L[i] == candidate && L[pos++] == candidate)
                    return candidate;
            }
            if (count > pos)
                return candidate;
            return (-1);