在一个数组中，一个值在数组中两次。你如何决定哪一个?

Assume that the array has integers between 1 and 1,000,000.

假设数组的整数在1到1,000,000之间。

I know some popular ways of solving this problem:

我知道一些解决这个问题的常用方法:

If all numbers between 1 and 1,000,000 are included, find the sum of the array elements and subtract it from the total sum (n*n+1/2)
如果包含1到1,000,000之间的所有数字，找到数组元素的和，并将它从总和(n*n+1/2)中减去
Use a hash map (needs extra memory)
使用散列映射(需要额外的内存)
Use a bit map (less memory overhead)
使用位映射(减少内存开销)

I recently came across another solution and I need some help in understanding the logic behind it:

我最近遇到了另一个解决方案，我需要一些帮助来理解它背后的逻辑:

Keep a single radix accumulator. You exclusive-or the accumulator with both the index and the value at that index.

保持一个单独的基数蓄能器。你排除-或累加器同时具有索引和该索引的值。

The fact that x ^ C ^ x == C is useful here, since each number will be xor'd twice, except the one that's in there twice, which will appear 3 times. (x ^ x ^ x == x) And the final index, which will appear once. So if we seed the accumulator with the final index, the accumulator's final value will be the number that is in the list twice.

这一事实x C ^ ^ = = C是有用的,因为每个数字将xor就两次,除了有两次的,将会出现3次。(x ^ ^ x = = x)和最终的指数,将出现一次。如果我们给累加器加上最终的索引，累加器的最终值将是列表中两次的数字。

I will appreciate it if some one can help me understand the logic behind this approach (with a small example!).

如果有人能帮助我理解这种方法背后的逻辑(用一个小例子!)

4 个解决方案

#1

Assume you have an accumulator

假设你有一个累加器

int accumulator = 0;

At each step of your loop, you XOR the accumulator with i and v, where i is the index of the loop iteration and v is the value in the ith position of the array.

在循环的每个步骤中，x或带有i和v的累加器，其中i是循环迭代的索引，v是数组第i个位置的值。

accumulator ^= (i ^ v)

Normally, i and v will be the same number so you will end up doing

通常情况下，i和v是相同的所以你最终会得到

accumulator ^= (i ^ i)

But i ^ i == 0, so this will end up being a no-op and the value of the accumulator will be left untouched. At this point I should say that the order of the numbers in the array does not matter because XOR is commutative, so even if the array is shuffled to begin with the result at the end should still be 0 (the initial value of the accumulator).

但我^ = = 0,这最终将是一个空操作和蓄电池的价值将原封不动。此时，我应该说数组中数字的顺序并不重要，因为XOR是可交换的，因此，即使数组以结果开头(累加器的初值)仍然应该为0。

Now what if a number occurs twice in the array? Obviously, this number will appear three times in the XORing (one for the index equal to the number, one for the normal appearance of the number, and one for the extra appearance). Furthermore, one of the other numbers will only appear once (only for its index).

现在，如果一个数字在数组中出现两次会怎么样?显然，这个数字将在XORing中出现三次(一个表示指数等于数字，一个表示数字的正常外观，一个表示额外的外观)。此外，其他一个数字将只出现一次(仅用于其索引)。

This solution now proceeds to assume that the number that only appears once is equal to the last index of the array, or in other words: that the range of numbers in the array is contiguous and starting from the first index to be processed (edit: thanks to caf for this heads-up comment, this is what I had in mind really but I totally messed it up when writing). With this (N appears only once) as a given, consider that starting with

这个解决方案现在继续假设只出现一次的数字等于最后一个数组的索引,或者换句话说:数组中数字的范围是连续的,从第一个索引处理(编辑:感谢caf这个足智多谋的评论,这就是我想要的真的写作时可让我给搞砸了)。有了这个(N只出现一次)作为给定，我们可以从这个开始

int accumulator = N;

effectively makes N again appear twice in the XORing. At this point, we are left with numbers that only appear exactly twice, and just the one number that appears three times. Since the twice-appearing numbers will XOR out to 0, the final value of the accumulator will be equal to the number that appears three times (i.e. one extra).

有效地使N在x中再次出现两次。此时，剩下的数字只出现了两次，只有一个数字出现了三次。由于两次出现的数字是XOR值为0，累加器的最终值将等于出现3次(即1个额外)的数字。

#2

Each number between 1 and 10,001 inclusive appears as an array index. (Aren't C arrays 0-indexed? Well, it doesn't make a difference provided we're consistent about whether the array values and indices both start at 0 or both start at 1. I'll go with the array starting at 1, since that's what the question seems to say.)

包含1到10,001之间的每个数字都作为数组索引显示。(不是C数组就是?如果我们对数组的值和索引都是从0开始还是都是从1开始的保持一致的话，这没有什么区别。我用数组从1开始，因为这就是问题的意思)

Anyway, yes, each number between 1 and 10,001 inclusive appears, precisely once, as an array index. Each number between 1 and 10,000 inclusive also appears as an array value precisely once, with the exception of the duplicated value which occurs twice. So mathematically, the calculation we're doing overall is the following:

无论如何，是的，包含1到10,001之间的每个数字都以数组索引的形式出现，准确地说是一次。除了重复出现两次的值之外，包含1到10,000的每个数字也只出现一次。所以从数学上来说，我们总的计算是这样的:

1 xor 1 xor 2 xor 2 xor 3 xor 3 xor ... xor 10,000 xor 10,000 xor 10,001 xor D

where D is the duplicated value. Of course, the terms in the calculation probably don't appear in that order, but xor is commutative, so we can rearrange the terms however we like. And n xor n is 0 for each n. So the above simplifies to

其中D为重复值。当然，计算中的项可能不会以这个顺序出现，但是xor是可交换的，所以我们可以重新排列这些项。n x或n = 0，所以上面的式子化简为

10,001 xor D

xor this with 10,001 and you get D, the duplicated value.

xor这个是10,001，得到D，重复值。

#3

The logic is that you only have to store the accumulator value, and only need to go through the array once. That's pretty clever.

逻辑是，您只需要存储累计器值，并且只需要遍历数组一次。这是非常聪明的。

Of course, whether this is the best method in practice depends on how much work it is to calculate the exclusive or, and how large your array is. If the values in the array are randomly distributed, it may be quicker to use a different method, even if it uses more memory, as the duplicate value is likely to be found possibly long before you check the entire array.

当然，这是否是实践中最好的方法取决于计算排他性或数组大小的工作量。如果数组中的值是随机分布的，那么使用不同的方法可能会更快，即使它使用了更多的内存，因为在检查整个数组之前很可能会发现重复的值。

Of course if the array is sorted to begin with, things are considerably easier. So it depends very much on how the values are distributed throughout the array.

当然，如果数组是先排序的，事情就容易多了。这很大程度上取决于这些值如何分布在整个数组中。

#4

The question is: are you interested in knowing how to do clever but purely academic xor tricks with little relevance to the real world, or do you want to know this because in the real world you may write programs that use arrays? This answer addresses the latter case.

问题是:你是否有兴趣知道如何做聪明的但纯学术性的xor技巧与现实世界无关，或者你想知道这一点，因为在现实世界中你可以编写使用数组的程序?这个答案针对后一种情况。

The no-nonsense solution is to go through the whole array and sort it as you do. While you sort, make sure there are no duplicate values, ie implement the abstract data type "set". This will probably require a second array to be allocated and the sorting will be time consuming. Whether it is more or less time consuming than clever xor tricks, I don't know.

无废话的解决方案是遍历整个数组并像您所做的那样对其进行排序。在排序时，确保没有重复的值，即实现抽象数据类型“set”。这可能需要分配第二个数组，而且排序会很耗时。我不知道它是否比聪明的xor技巧更耗时。

However, what good is an array of n unsorted values to you in the real world? If they are unsorted we have to assume that their order is important somehow, so the original array might have to be preserved. If you want to search through the original array or analyse it for duplicates, median value etc etc you really want a sorted version of it. Once you have it sorted you can binary search it with "O log n".

然而，一个n个未排序值的数组在现实世界中有什么用呢?如果它们是无序的，我们必须假设它们的顺序是重要的，所以原始数组可能必须被保留。如果你想搜索原始数组或者分析它的副本，中值等等你真的想要它的排序版本。一旦排序，你就可以用O log n进行二进搜索。

#1