在2个排序数组中查找公共元素[重复]

Possible Duplicate:
The intersection of two sorted arrays

可能重复:两个排序数组的交集

we have two sorted arrays A and B, besides compare one with all the elements in other array, how to design a best algorithm to find the array with their common elements?

我们有两个排序的数组A和B,除了比较一个与其他数组中的所有元素之外,如何设计一个最佳算法来查找具有它们共同元素的数组?

3 个解决方案

#1

Hold two pointers: one for each array.

保持两个指针:每个数组一个。

i <- 0, j <- 0
repeat while i < length(arr1) and j < length(arr2):
    if arr1[i] > arr2[j]: increase j
    else if arr1[i] < arr2[j]: increase i
    else : output arr[i], increase both pointers

The idea is, if the data is sorted, if the element is "too big" in one array, it will be "too big" for all other elements left in the array - since it is sorted.

我们的想法是,如果对数据进行排序,如果元素在一个数组中“太大”,那么对于数组中剩余的所有其他元素,它将“太大” - 因为它已经过排序。

This solution requires a single traversal on the data. O(n) (with good constants as well).

此解决方案需要对数据进行单次遍历。 O(n)(也具有良好的常数)。

#2

If the lengths of two arrays (say, A has N elements and B has M elements) are similar, then the best approach would be to perform linear search of one array's elements in another array. Of course, since the arrays are sorted, the next search should begin where the previous search has stopped. This is the classic principle used in "sorted array merge" algorithm. The complexity on O(N + M).

如果两个数组的长度(例如,A有N个元素,B有M个元素)相似,那么最好的方法是对另一个数组中的一个数组元素进行线性搜索。当然,由于数组已排序,下一次搜索应该从上一次搜索停止的地方开始。这是“排序数组合并”算法中使用的经典原理。 O(N + M)的复杂性。

If the lengths are significantly different (say, M << N), then a much more optimal approach would be to iterate through elements of the shorter array and use binary search to look for these values in the longer array. The complexity is O(M * log N) in that case.

如果长度明显不同(例如,M << N),那么更优化的方法是迭代较短数组的元素并使用二进制搜索在较长数组中查找这些值。在这种情况下,复杂度为O(M * log N)。

As you can see O(M * log N) is better than O(N + M) if M is much smaller than N, and worse otherwise.

如你所见,如果M远小于N,则O(M * log N)优于O(N + M),否则更糟。

The difference in array sizes which should trigger the switch from one approach to another depends on some practical considerations. If should be chosen based on practical experiments with your data.

应该触发从一种方法切换到另一种方法的阵列大小的差异取决于一些实际考虑因素。如果应根据您的数据的实际实验选择。

These two approaches (linear and binary searches) can be "blended" into a single algorithm. Let's assume M <= N. In that case let's choose step value S = [N / M]. You take first element from array A and perform a straddled linear search for that element in array B with step S, meaning that you check elements B[0], B[S], B[2*S], B[3*S], ... and so on. Once you find the index range [S*i, S*(i+1)] that potentially contains the element you are searching for, you switch to binary search inside that segment of array B. Done. The straddled linear search for the next element of A begins where the previous search left off. (As a side note, it might make sense to choose the value of S equal to a power of 2).

这两种方法(线性和二进制搜索)可以“混合”到单个算法中。假设M <= N.在这种情况下,让我们选择步长值S = [N / M]。从阵列A获取第一个元素,并使用步骤S对阵列B中的元素执行跨越线性搜索,这意味着您检查元素B [0],B [S],B [2 * S],B [3 * S ], ... 等等。找到可能包含要搜索的元素的索引范围[S * i,S *(i + 1)]后,切换到阵列B的该段内的二进制搜索。完成。对于A的下一个元素的跨越线性搜索从前一个搜索停止的地方开始。 (作为旁注,选择S的值等于2的幂可能是有意义的)。

This "blended" algorithm is the most asymptotically optimal search/merge algorithm for two sorted arrays in existence. However, in practice the more simple approach with choosing either binary or linear search depending on relative sizes of the arrays works perfectly well.

这种“混合”算法是存在的两个有序数组的最渐近最优的搜索/合并算法。然而,在实践中,根据阵列的相对大小选择二进制或线性搜索的更简单的方法非常有效。

#3

besides compare one with all the elements in other array

除了比较一个与其他数组中的所有元素

You will have to compare A[] to B[] in order to know that they are the same -- unless you know a lot about what kind of data they can hold. The nature of the comparison probably has many solutions and can be optimized as required.

您必须将A []与B []进行比较才能知道它们是相同的 - 除非您对它们可以容纳的数据类型有很多了解。比较的性质可能有许多解决方案,可以根据需要进行优化。

If the arrays are very strictly created ie only sequential values of a known pattern and always starts from a known point you could just look at the length of each array and know whether or not all items are common.

如果数组是非常严格创建的,即只有已知模式的连续值,并且始终从已知点开始,您只需查看每个数组的长度,并知道所有项是否通用。

This unfortunately doesn't sound like a very realistic or useful array and so you are back to checking for A[i] in B[]

遗憾的是,这听起来并不像一个非常现实或有用的数组,所以你回到检查B []中的A [i]

#1