查找数组中所有元素是否不同的最快方法？

I am looking for a faster way to find whether an array of elements contains distinct elements only. The worst thing to do is to take each element and compare it to every other element in the array. Next best would be to sort the list and then compare, which still does not improves this much. Is there any other way to do this?

我正在寻找一种更快的方法来查找元素数组是否仅包含不同的元素。最糟糕的事情是采用每个元素并将其与数组中的每个其他元素进行比较。接下来最好的方法是对列表进行排序,然后进行比较,但仍然没有提高这一点。有没有其他方法可以做到这一点?

3 个解决方案

#1

Brute-force:

Brute-force (checking every element with every other element) takes O(n²).

暴力(用每个其他元素检查每个元素)需要O(n2)。

Sorting:

Sorting takes O(n log n), which is generally considered to be a fairly decent running time.

排序需要O(n log n),这通常被认为是相当不错的运行时间。

Sorting has the advantage above the below (hash table) approach in that it can be done in-place (O(1) extra space), where-as the below takes O(n) extra space.

排序具有高于以下(哈希表)方法的优点,因为它可以就地完成(O(1)额外空间),其中 - 如下所示占用O(n)额外空间。

Hash table:

An alternative is to use a hash table.

另一种方法是使用哈希表。

For each item:

对于每个项目:

Check if that item exists in the hash table (if it does, all items are not distinct) and

检查哈希表中是否存在该项(如果存在,则所有项都不相同)和

Insert that item into the hash table

将该项插入哈希表

Since insert and contains queries run in expected O(1) on a hash table, the overall running time would be expected O(n), and, as mentioned above, O(n) extra space.

由于insert和contains查询在哈希表上的预期O(1)中运行,因此总体运行时间将为O(n),并且如上所述,O(n)额外空间。

Bit array:

Another alternative, if the elements are all integers in some given range, is to have a bit array with size equal to the range of integers.

另一种替代方案是,如果元素在某个给定范围内都是整数,则具有大小等于整数范围的位数组。

Similarly to what was done for the hash table approach, for each element, you'd check whether the applicable bit is set, and then set it.

与哈希表方法的操作类似,对于每个元素,您将检查是否设置了适用位,然后进行设置。

This takes O(m + n) time and O(m) extra space where m is the range of integers and n is the size of the array (unless you consider allocating the array to be free, in which case it just takes O(n) time).

这需要O(m + n)时间和O(m)额外空间,其中m是整数范围,n是数组的大小(除非你考虑将数组分配为空闲,在这种情况下它只需要O( n)时间)。

#2

Create a Red and Black tree where elements as keys and number of occurrences are value. You can then navigate the tree. Time and space complexity is O(n) where n is the number of elements. Key benefits using a red and black tree include consistent performance and simple memory management - excellent choice for a distributed environement. Perspectives welcome.

创建一个红色和黑色树,其中元素作为键和出现次数是值。然后,您可以导航树。时间和空间复杂度为O(n),其中n是元素的数量。使用红色和黑色树的主要好处包括一致的性能和简单的内存管理 - 分布式环境的最佳选择。观点欢迎。

#3

Alternative solution (interesting only from theoretic point of view):

替代解决方案(仅从理论角度来看有趣):

I think you can adapt the Quickselect algorithm. In short, this algorithm runs in the same way as Quick sort, but it only splits the array in two groups, according to some chosen pivot (less and more than the pivot, respectively), thus the sorting is omitted. It's average case performance is O(n).

我认为你可以适应Quickselect算法。简而言之,该算法以与快速排序相同的方式运行,但它只根据一些选定的枢轴(分别少于和多于枢轴)将数组分成两组,因此省略排序。它的平均案例表现是O(n)。

My idea is to look for elements equal to the chosen pivot on each step. In this way, whenever there are more than two elements, we will compare the pivot to each element. If we have found a duplicate, we have the answer. Otherwise we split the problem in two similar ones, but with smaller size and run the algorithm on them.

我的想法是在每一步中寻找与所选枢轴相等的元素。这样,只要有两个以上的元素,我们就会比较每个元素的数据。如果我们找到了重复,我们就有了答案。否则我们将问题分成两个相似的问题,但是尺寸较小并在其上运行算法。

Disclaimer: The worst case performance of Quickselect is O(n^2). Therefore, using a hash table is way more time efficient.

免责声明:Quickselect的最坏情况是O(n ^ 2)。因此,使用哈希表更有时间效率。

However, as Quickselect is an in-place algorithm, it requires only constant memory overhead as opposed to linear additional memory for a hash table (not that it matters nowadays).

但是,由于Quickselect是一种就地算法,它只需要恒定的内存开销,而不是哈希表的线性附加内存(现在并不重要)。

#1