在数组中找到一个不重复三次的元素?

After reading this interesting question I was reminded of a tricky interview question I had once that I never satisfactorily answered:

读了这个有趣的问题之后，我想起了一个我从未满意回答过的棘手的面试问题:

You are given an array of n 32-bit unsigned integers where each element (except one) is repeated a multiple of three times. In O(n) time and using as little auxiliary space as possible, find the element of the array that does not appear a multiple of three times.

给定一个n 32位无符号整数数组，其中每个元素(除了一个)重复三次。在O(n)时间内，尽量使用尽可能小的辅助空间，查找不出现三次倍数的数组元素。

As an example, given this array:

举个例子，给定这个数组:

1 1 2 2 2 3 3 3 3 3 3

1 2 2 3 3 3 3 3 3 3 3 3。

We would output 1, while given the array

在给定数组的情况下，我们将输出1。

3 2 1 3 2 1 2 3 1 4 4 4 4

3 2 1 3 2 1 2 3 4 4 4 4。

We would output 4.

我们将输出4。

This can easily be solved in O(n) time and O(n) space by using a hash table to count the frequencies of each element, though I strongly suspect that because the problem statement specifically mentioned that the array contains 32-bit unsigned integers that there is a much better solution (I'm guessing O(1) space).

这可以很容易地解决了O(n)时间和O(n)空间通过使用一个哈希表来计算每个元素的频率,我强烈怀疑,因为问题声明特别提到,该数组包含32位无符号整数,有一个更好的解决方案(我猜O(1)空间)。

Does anyone have any ideas on how to solve this?

有人知道如何解决这个问题吗?

3 个解决方案

#1

It can be done in O(n) time and O(1) space.

它可以在O(n)时间和O(1)空间中完成。

Here is how you can do it with constant space in C#. I'm using the idea of "xor except with 3-state bits". For every set bit, the "xor" operation increments the corresponding 3-state value.

下面是在c#中使用常量空间的方法。我用的是“xor除了3个状态位”的概念。对于每一个设置位，“xor”操作将增加相应的3个状态值。

The final output will be the number whose binary representation has 1s in places that are either 1 or 2 in the final value.

最终的输出将是在最终值中有1或2的位置的二进制表示有1的数字。

void Main() {
    Console.WriteLine (FindNonTriple(new uint[] 
                        {1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3} ));
    // 1

    Console.WriteLine (FindNonTriple(new uint[] 
                        {3, 2, 1, 3, 2, 1, 3, 2, 1, 4, 4, 4, 4} ));
    // 4
}

uint FindNonTriple(uint[] args) {
    byte[] occurred = new byte[32];

    foreach (uint val in args) {
        for (int i = 0; i < 32; i++) {
            occurred[i] = (byte)((occurred[i] + (val >> i & 1)) % 3);
        }
    }

    uint result = 0;
    for (int i = 0; i < 32; i++) {
        if (occurred[i] != 0) result |= 1u << i;
    }
    return result;
}

#2

The obvious solution to do it in constant space is to sort it using an in place algorithm, and then scan once over the array.

在恒定空间中，最明显的解决方法是用一个就地算法对它进行排序，然后对数组进行一次扫描。

Sadly this requires usually O(n log n) time and O(1) space.

不幸的是，这通常需要O(n log n)时间和O(1)空间。

But as the entries have a limited key length (32 bit) you can use as sort algorithm radix sort (there exist in place radix sort, they are not stable, but that doesnt matter here). There you have O(n) time and O(1) space.

但由于条目的密钥长度有限(32位)，您可以将其用作排序算法基数排序(存在于位置基数排序，它们不稳定，但这并不重要)。这里有O(n)时间和O(1)空间。

EDIT: Btw you could use this approach to find also ALL numbers that appear not a multiple of 3 times, in case you would allow that more than one number could have this property.

编辑:顺便说一下，你可以使用这个方法来查找所有的数字，而不是3倍的倍数，如果你允许超过一个数字可以拥有这个属性。

#3

You're looking for an item with a rep-count that's non-zero (mod 3). I think I'd do it recursively:

你正在寻找一个非零的重复计数的项(mod 3)，我认为我应该递归地做它:

split the array in half
把数组分成两半。
find items with rep count that's non-zero (mod 3) in each half
查找具有代表数的项目，在每一半中为非零(mod 3)。
merge the halves, keeping counts for unequal keys, and adding the counts for equal keys
合并两部分，保存不相等的键数，并添加相等键的计数。
strike out any with count = 0 (mod 3)
用count = 0 (mod 3)删除任何值
return that as the set of values for the current part of the input.
作为输入的当前部分的一组值返回。

Without even trying to optimize things beyond the basic algorithm (e.g., not worrying about storing only two bits per count), this seems to do pretty well. I've included code to generate a reasonably large test case (e.g., 1500+ items) and print out the sizes of the maps it's creating. At any given time, it seems to have a maximum of around 50 items in the maps it creates (i.e., it only uses two maps at a time, and the largest I've seen is around 25 items). Technically, as it stands I believe this is currently something like O(N log N), but if you switched to a hash-based container, I believe you'd expect O(N).

甚至没有尝试去优化基本算法之外的东西(例如，不用担心只存储两个字节)，这似乎做得很好。我已经包含了生成一个相当大的测试用例(例如，1500+项)的代码，并打印出它正在创建的地图的大小。在任何给定的时间内，它在其创建的地图中似乎最多有50个条目(即:它一次只使用两张地图，而我所见过的最大的只有25个。从技术上讲，我认为这是类似于O(N log N)的东西，但是如果你切换到一个基于哈希的容器，我相信你会期待O(N)。

#include <map>
#include <iterator>
#include <iostream>
#include <algorithm>
#include <utility>
#include <vector>
#include <time.h>

class zero_mod { 
    unsigned base;
public:
    zero_mod(unsigned b) : base(b) {}

    bool operator()(std::pair<int, int> const &v) { 
        return v.second % base == 0; 
    }
};

// merge two maps together -- all keys from both maps, and the sums 
// of equal values.
// Then remove any items with a value congruent to 0 (mod 3)
//
std::map<int, int> 
merge(std::map<int, int> const &left, std::map<int, int> const &right) { 
    std::map<int, int>::const_iterator p, pos;
    std::map<int, int> temp, ret;

    std::copy(left.begin(), left.end(), std::inserter(temp, temp.end()));
    for (p=right.begin(); p!=right.end(); ++p) 
        temp[p->first] += p->second;
    std::remove_copy_if(temp.begin(), temp.end(), 
                        std::inserter(ret, ret.end()), 
                        zero_mod(3));
    return ret;
}   

// Recursively find items with counts != 0 (mod 3):    
std::map<int, int> 
do_count(std::vector<int> const &input, size_t left, size_t right) { 
    std::map<int, int> left_counts, right_counts, temp, ret;

    if (right - left <= 2) {
        for (size_t i=left; i!=right; ++i) 
            ++ret[input[i]];
        return ret;
    }
    size_t middle = left + (right-left)/2;
    left_counts = do_count(input, left, middle);
    right_counts = do_count(input, middle, right);
    ret = merge(left_counts, right_counts);

    // show the size of the map to get an idea of how much storage we're using.
    std::cerr << "Size: " << ret.size() << "\t";
    return ret;
}

std::map<int, int> count(std::vector<int> const &input) { 
    return do_count(input, 0, input.size());
}

namespace std { 
    ostream &operator<<(ostream &os, pair<int, int> const &p) { 
        return os << p.first;
    }
}

int main() {
    srand(time(NULL));
    std::vector<int> test;

    // generate a bunch of data by inserting packets of 3 items    
    for (int i=0; i<100; i++) {
        int packets = std::rand() % 10;
        int value = rand() % 50;
        for (int i=0; i<packets * 3; i++) 
            test.push_back(value);
    }

    // then remove one item, so that value will not occur a multiple of 3 times
    size_t pos = rand() % test.size();
    std::cerr << "Removing: " << test[pos] << " at position: " << pos << "\n";

    test.erase(test.begin()+pos);

    std::cerr << "Overall size: " << test.size() << "\n";

    std::random_shuffle(test.begin(), test.end());

    // Note that we use a map of results, so this will work if multiple values
    // occur the wrong number of times.    
    std::map<int, int> results = count(test);

    // show the results. Should match the value shown as removed above:
    std::copy(results.begin(), results.end(), 
        std::ostream_iterator<std::pair<int, int> >(std::cout, "\n"));
    return 0;
}

#1