数组中不同元素的数量

Is it possible to compute the number of different elements in an array in linear time and constant space? Let us say it's an array of long integers, and you can not allocate an array of length sizeof(long).

是否可以在线性时间和恒定空间中计算数组中不同元素的数量?让我们说它是一个长整数数组,你不能分配长度sizeof(long)的数组。

P.S. Not homework, just curious. I've got a book that sort of implies that it is possible.

附:不是作业,只是好奇。我有一本书暗示它是可能的。

7 个解决方案

#1

This is the Element uniqueness problem, for which the lower bound is Ω( n log n ), for comparison-based models. The obvious hashing or bucket sorting solution all requires linear space too, so I'm not sure this is possible.

这是元素唯一性问题,对于基于比较的模型,下限为Ω(n log n)。明显的散列或桶分类解决方案都需要线性空间,所以我不确定这是否可行。

#2

You can't use constant space. You can use O(number of different elements) space; that's what a HashSet does.

你不能使用恒定的空间。你可以使用O(不同元素的数量)空间;这就是HashSet的作用。

#3

You can use any sorting algorithm and count the number of different adjacent elements in the array.

您可以使用任何排序算法并计算数组中不同相邻元素的数量。

#4

I do not think this can be done in linear time. One algorithm to solve in O(n log n) requires first sorting the array (then the comparisons become trivial).

我不认为这可以在线性时间内完成。要在O(n log n)中求解的一种算法需要首先对数组进行排序(然后比较变得微不足道)。

#5

If you are guaranteed that the numbers in the array are bounded above and below, by say a and b, then you could allocate an array of size b - a, and use it to keep track of which numbers have been seen.

如果你保证数组中的数字在上下限制,比如a和b,那么你可以分配一个大小为b-a的数组,并用它来跟踪看到的数字。

i.e., you would move through your input array take each number, and mark a true in your target array at that spot. You would increment a counter of distinct numbers only when you encounter a number whose position in your storage array is false.

也就是说,您将在输入数组中移动每个数字,并在该位置的目标数组中标记为true。只有在遇到存储阵列中的位置为假的数字时,才会增加不同数字的计数器。

#6

Assuming we can partially destroy the input, here's an algorithm for n words of O(log n) bits.

假设我们可以部分破坏输入,这里是一个O(log n)位的n个字的算法。

Find the element of order sqrt(n) via linear-time selection. Partition the array using this element as a pivot (O(n)). Using brute force, count the number of different elements in the partition of length sqrt(n). (This is O(sqrt(n)^2) = O(n).) Now use an in-place radix sort on the rest, where each "digit" is log(sqrt(n)) = log(n)/2 bits and we use the first partition to store the digit counts.

通过线性时间选择找到顺序sqrt(n)的元素。使用此元素作为轴(O(n))对数组进行分区。使用强力,计算长度为sqrt(n)的分区中不同元素的数量。 (这是O(sqrt(n)^ 2)= O(n)。)现在在其余部分使用就地基数排序,其中每个“数字”是log(sqrt(n))= log(n)/ 2位,我们使用第一个分区来存储数字计数。

If you consider streaming algorithms only ( http://en.wikipedia.org/wiki/Streaming_algorithm ), then it's impossible to get an exact answer with o(n) bits of storage via a communication complexity lower bound ( http://en.wikipedia.org/wiki/Communication_complexity ), but possible to approximate the answer using randomness and little space (Alon, Matias, and Szegedy).

如果你只考虑流媒体算法(http://en.wikipedia.org/wiki/Streaming_algorithm),那么通过通信复杂性下限就不可能得到o(n)位存储的精确答案(http:// en .wikipedia.org / wiki / Communication_complexity),但可以使用随机性和小空间(Alon,Matias和Szegedy)来近似答案。

#7

-1

This can be done with a bucket approach when assuming that there are only a constant number of different values. Make a flag for each value (still constant space). Traverse the list and flag the occured values. If you happen to flag an already flagged value, you've found a duplicate. You have to traverse the buckets for each element in the list. But that's still linear time.

当假设只有恒定数量的不同值时,可以使用桶方法来完成。为每个值(仍为常量空间)创建一个标志。遍历列表并标记出现的值。如果您碰巧标记了已标记的值,则表示您找到了重复项。您必须遍历列表中每个元素的存储桶。但这仍然是线性时间。

#1