JavaScript：删除数组数组中的重复项

Currently using JavaScript and I need to go through an array of arrays to determine if there are any duplicate arrays, and then deleting those duplicated arrays. Runtime is of the essence in this case, so I was wondering what the most EFFICIENT way of doing this is.

目前使用JavaScript我需要通过一个数组数组来确定是否有任何重复的数组,然后删除这些重复的数组。在这种情况下,运行时是至关重要的,所以我想知道最有效的方法是什么。

Is using a hash table desirable in this case? The scope of this would be to hash each sequence and then use the hash to determine whether that sequence occurs again. Hence, each sequence is an array within the master array, and any duplicates would be other arrays within the same array. Furthermore, it is extremely important that all individual arrays remain ordered themselves (i.e. the elements in the individual arrays must always keep their position). Also, all elements in the individual array are string values.

在这种情况下使用哈希表是否可取?这样做的范围是散列每个序列,然后使用散列来确定该序列是否再次出现。因此,每个序列是主阵列中的一个阵列,任何重复序列都是同一阵列中的其他阵列。此外,非常重要的是所有单独的阵列本身保持有序(即,各个阵列中的元素必须始终保持其位置)。此外,单个数组中的所有元素都是字符串值。

Example: Assume that there is an array A whose elements are in turn the following arrays:

示例:假设存在一个数组A,其元素依次为以下数组:

A[0] = ["one", "two", "three", "four"]
A[1] = ["two", "one", "three", "four"]
A[2] = ["one", "two", "three", "four"]

In the above example, A[0] and A[2] are duplicates and so the function should return A[0] and A[1], such that there is only one instance of the same array.

在上面的例子中,A [0]和A [2]是重复的,因此函数应该返回A [0]和A [1],这样只有一个相同数组的实例。

2 个解决方案

#1

Keep an object where the keys are the joined elements of each array. If the key is not found add the array to the output array and add the key to the object.

保持一个对象,其中键是每个数组的连接元素。如果未找到密钥,请将数组添加到输出数组并将密钥添加到对象。

var hash = {};
var out = [];
for (var i = 0, l = A.length; i < l; i++) {
  var key = A[i].join('|');
  if (!hash[key]) {
    out.push(A[i]);
    hash[key] = 'found';
  }
}

DEMO

#2

Ok let us first have a look at the complexity of the naive solution: If there are n arrays, each with at most k entries, you need O(n^2 * k) comparisons, because for each of these n arrays, you have to compare it to n-1 others with k comparisons each. The space complexity is O(n*k)

好吧,让我们先来看看天真解决方案的复杂性:如果有n个数组,每个数组最多有k个条目,则需要进行O(n ^ 2 * k)比较,因为对于这n个数组中的每一个,你都有将它与n-1个进行比较,每个进行k次比较。空间复杂度为O(n * k)

So if you are willing to trade space for better performance, you can do the following: (Short disclaimer: I assume all your arrays have an equal number of k elements which is indicated but not approved by your question.)

因此,如果您愿意交换空间以获得更好的性能,您可以执行以下操作:(简短免责声明:我假设您的所有阵列都有相同数量的k元素,这些元素已被指出但未经您的问题批准。)

Going one by one through the arrays, you pick the first element which we assume is a. Use a hash map to verify whether you saw this element as a first element before. If not, create a tree structure with a as its root, store it under a in your hash map and make it your current node. Now, for each subsequent entry in the current array, you check whether your current node has a child of that kind. So if the second entry is b, you add b to be a child of a.

逐个通过数组,你选择我们假设的第一个元素。使用哈希映射来验证您是否将此元素视为之前的第一个元素。如果没有,请创建一个以其根为根的树结构,将其存储在哈希映射中的a下,并使其成为当前节点。现在,对于当前数组中的每个后续条目,检查当前节点是否具有该类型的子节点。因此,如果第二个条目是b,则将b添加为a的子级。

Your tree now looks like that: (left to right: root to children)

你的树现在看起来像这样:(从左到右:root到孩子)

a - b

Having c as the third entry works exactly the same:

将c作为第三个条目的工作方式完全相同:

a - b - c

Now we skip forward to have a look on an array [a, c, d]. You first encounter the tree for element a. For the second element, you check whether c is already a child of a. If not, add it:

现在我们跳过去查看一个数组[a,c,d]。您首先遇到元素a的树。对于第二个元素,检查c是否已经是a的子元素。如果没有,请添加:

  - b - c
a
  - c

same goes for the next entry:

同样适用于下一个条目:

  - b - c
a
  - c - d

Let us now see what happens when we check an array that we saw before: [a, b, c]

现在让我们看看当我们检查之前看到的数组时会发生什么:[a,b,c]

First we check a, see that there is already a tree and get it from the hash map. Next, we notice that ahas a child named b, so we descend to b. Now, for the last entry, we see that it is already there too, telling us that we encountered a duplicate which we can drop.

首先我们检查a,看看已经存在一棵树并从哈希映射中获取它。接下来,我们注意到ahas有一个名为b的孩子,所以我们下降到b。现在,对于最后一个条目,我们看到它已经存在,告诉我们我们遇到了一个我们可以删除的副本。

Sorry for the improvised drawing, I hope I can get the idea across. It is just about going through each array only once, storing it in a non-redundant way. So the time complexity would be O(n*k). The used space increases but is bounded by O(n*k) since the worst case is no array shared any prefix, which results in the same space complexity.

对于即兴绘图我很抱歉,我希望我可以理解这个想法。它只是通过每个数组一次,以非冗余的方式存储它。所以时间复杂度为O(n * k)。使用的空间增加但受O(n * k)限制,因为最坏的情况是没有数组共享任何前缀,这导致相同的空间复杂度。

Hope I didn't overlook something.

希望我没有忽视一些事情。

#1