使用Javascript数组计算集差的最快或最优雅的方法是什么?

时间:2021-01-15 20:19:23

Let A and B be two sets. I'm looking for really fast or elegant ways to compute the set difference (A - B or A \B, depending on your preference) between them. The two sets are stored and manipulated as Javascript arrays, as the title says.

让A和B是两个集合。我正在寻找非常快速或优雅的方法来计算它们之间的集差(A - B或A \B,取决于您的喜好)。如标题所示,这两个集合被存储并作为Javascript数组进行操作。

Notes:

注:

  • Gecko-specific tricks are okay
  • Gecko-specific技巧好
  • I'd prefer sticking to native functions (but I am open to a lightweight library if it's way faster)
  • 我宁愿坚持使用本机函数(但如果速度更快的话,我可以使用轻量级库)
  • I've seen, but not tested, JS.Set (see previous point)
  • 我看过,但没有测试过,JS。集(见以前的点)

Edit: I noticed a comment about sets containing duplicate elements. When I say "set" I'm referring to the mathematical definition, which means (among other things) that they do not contain duplicate elements.

编辑:我注意到一个关于包含重复元素的集合的评论。当我说“set”时,我指的是数学定义,这意味着(除其他外)它们不包含重复的元素。

11 个解决方案

#1


137  

if don't know if this is most effective, but perhaps the shortest

如果不知道这是否是最有效的,也许是最短的

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(function(x) { return B.indexOf(x) < 0 })

console.log(diff);

Updated to ES6:

ES6更新:

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(x => B.indexOf(x) < 0 );

console.log(diff);

#2


41  

Well, 7 years later, with ES6's Set object it's quite easy (but still not as compact as pythons A - B), and reportedly faster than indexOf for large arrays:

7年后,ES6的Set对象非常简单(但仍然没有python A - B那么紧凑),据说比大型数组的indexOf要快:

let a = new Set([1,2,3,4]);
let b = new Set([5,4,3,2]);

console.log(new Set([...a].filter(x => !b.has(x)))); //a\b => {1}
console.log(new Set([...b].filter(x => !a.has(x)))); //b\a => {5}
console.log(new Set([...a].filter(x => b.has(x))));  //a∩b => {2,3,4}

#3


16  

You can use an object as a map to avoid linearly scanning B for each element of A as in user187291's answer:

您可以使用对象作为映射来避免对a的每个元素进行线性扫描B,如user187291的回答:

function setMinus(A, B) {
    var map = {}, C = [];

    for(var i = B.length; i--; )
        map[B[i].toSource()] = null; // any other value would do

    for(var i = A.length; i--; ) {
        if(!map.hasOwnProperty(A[i].toSource()))
            C.push(A[i]);
    }

    return C;
}

The non-standard toSource() method is used to get unique property names; if all elements already have unique string representations (as is the case with numbers), you can speed up the code by dropping the toSource() invocations.

非标准的toSource()方法用于获取唯一的属性名;如果所有元素都有唯一的字符串表示(就像数字一样),那么可以通过删除toSource()调用来加快代码的速度。

#4


9  

The shortest, using jQuery, is:

最短的,使用jQuery,是:

var A = [1, 2, 3, 4];
var B = [1, 3, 4, 7];

var diff = $(A).not(B);

console.log(diff.toArray());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

#5


5  

I would hash the array B, then keep values from the array A not present in B:

我将数组B哈希,然后保留数组A中不存在的值:

function getHash(array){
  // Hash an array into a set of properties
  //
  // params:
  //   array - (array) (!nil) the array to hash
  //
  // return: (object)
  //   hash object with one property set to true for each value in the array

  var hash = {};
  for (var i=0; i<array.length; i++){
    hash[ array[i] ] = true;
  }
  return hash;
}

function getDifference(a, b){
  // compute the difference a\b
  //
  // params:
  //   a - (array) (!nil) first array as a set of values (no duplicates)
  //   b - (array) (!nil) second array as a set of values (no duplicates)
  //
  // return: (array)
  //   the set of values (no duplicates) in array a and not in b, 
  //   listed in the same order as in array a.

  var hash = getHash(b);
  var diff = [];
  for (var i=0; i<a.length; i++){
    var value = a[i];
    if ( !hash[value]){
      diff.push(value);
    }
  }
  return diff;
}

#6


4  

Incorporating the idea from Christoph and assuming a couple of non-standard iteration methods on arrays and objects/hashes (each and friends), we can get set difference, union and intersection in linear time in about 20 lines total:

结合Christoph的想法,假设在数组和对象/散列(每个和朋友)上有一些非标准的迭代方法,我们可以在大约20行中得到设置差、合并和相交的线性时间:

var setOPs = {
  minusAB : function (a, b) {
    var h = {};
    b.each(function (v) { h[v] = true; });
    return a.filter(function (v) { return !h.hasOwnProperty(v); });
  },
  unionAB : function (a, b) {
    var h = {}, f = function (v) { h[v] = true; };
    a.each(f);
    b.each(f);
    return myUtils.keys(h);
  },
  intersectAB : function (a, b) {
    var h = {};
    a.each(function (v) { h[v] = 1; });
    b.each(function (v) { h[v] = (h[v] || 0) + 1; });
    var fnSel = function (v, count) { return count > 1; };
    var fnVal = function (v, c) { return v; };
    return myUtils.select(h, fnSel, fnVal);
  }
};

This assumes that each and filter are defined for arrays, and that we have two utility methods:

假设每个和过滤器都是为数组定义的,并且我们有两个实用方法:

  • myUtils.keys(hash): returns an array with the keys of the hash

    myUtils.keys(hash):返回一个具有散列的键的数组

  • myUtils.select(hash, fnSelector, fnEvaluator): returns an array with the results of calling fnEvaluator on the key/value pairs for which fnSelector returns true.

    myUtils。select(hash, fnSelector, fnEvaluator):返回一个数组,其结果是在fnSelector返回的键/值对上调用fnEvaluator。

The select() is loosely inspired by Common Lisp, and is merely filter() and map() rolled into one. (It would be better to have them defined on Object.prototype, but doing so wrecks havoc with jQuery, so I settled for static utility methods.)

select()受到了Common Lisp的松散启发,只是将filter()和map()合并为一个。(最好把它们定义在对象上。原型,但这样做破坏了jQuery,所以我选择了静态实用工具方法。

Performance: Testing with

性能:测试

var a = [], b = [];
for (var i = 100000; i--; ) {
  if (i % 2 !== 0) a.push(i);
  if (i % 3 !== 0) b.push(i);
}

gives two sets with 50,000 and 66,666 elements. With these values A-B takes about 75ms, while union and intersection are about 150ms each. (Mac Safari 4.0, using Javascript Date for timing.)

给出了包含50,000和66,666个元素的两个集合。A-B的值约为75ms,而union和crossroads的值约为150ms。(Mac Safari 4.0,使用Javascript日期计时)

I think that's decent payoff for 20 lines of code.

我认为这对20行代码来说是个不错的回报。

#7


3  

Using Underscore.js (Library for functional JS)

使用下划线。js(函数js库)

>>> var foo = [1,2,3]
>>> var bar = [1,2,4]
>>> _.difference(foo, bar);
[4]

#8


1  

This works, but I think another one is much more shorter, and elegant too

这是可行的,但我认为另一个更短,也更优雅

A = [1, 'a', 'b', 12];
B = ['a', 3, 4, 'b'];

diff_set = {
    ar : {},
    diff : Array(),
    remove_set : function(a) { ar = a; return this; },
    remove: function (el) {
        if(ar.indexOf(el)<0) this.diff.push(el);
    }
}

A.forEach(diff_set.remove_set(B).remove,diff_set);
C = diff_set.diff;

#9


1  

As for the fasted way, this isn't so elegant but I've run some tests to be sure. Loading one array as an object is far faster to process in large quantities:

至于固定方式,这并不是那么优雅,但我已经运行了一些测试来确定。将一个数组作为对象加载,处理大量数据要快得多:

var t, a, b, c, A;

    // Fill some arrays to compare
a = Array(30000).fill(0).map(function(v,i) {
    return i.toFixed();
});
b = Array(20000).fill(0).map(function(v,i) {
    return (i*2).toFixed();
});

    // Simple indexOf inside filter
t = Date.now();
c = b.filter(function(v) { return a.indexOf(v) < 0; });
console.log('completed indexOf in %j ms with result %j length', Date.now() - t, c.length);

    // Load `a` as Object `A` first to avoid indexOf in filter
t = Date.now();
A = {};
a.forEach(function(v) { A[v] = true; });
c = b.filter(function(v) { return !a[v]; });
console.log('completed Object in %j ms with result %j length', Date.now() - t, c.length);

Results:

结果:

completed indexOf in 1219 ms with result 5000 length
completed Object in 8 ms with result 5000 length

However, this works with strings only. If you plan to compare numbered sets you'll want to map results with parseInt.

但是,这只适用于字符串。如果您计划比较编号集,您将希望使用parseInt映射结果。

#10


1  

You can use this lightweight array-diff open source component.

您可以使用这个轻量级的array-diff开源组件。

Example:

例子:

diff([1,2,3], [1,2,3,4,5]) // => [4,5]

It works by concating the two arrays passed and filtering included vals, returning an array representing the difference between the two arrays:

它通过对所传递的两个数组进行处理,并过滤其中包含的vals,返回一个表示两个数组之间差异的数组:

function diff(firstArray: any[], secondArray: any[]): any[] {
  return firstArray.concat(secondArray).filter((val) => {
    return !(firstArray.includes(val) && secondArray.includes(val));
  });
};

#11


1  

Some simple functions, borrowing from @milan's answer:

一些简单的函数,借用@milan的回答:

const setDifference = (a, b) => new Set([...a].filter(x => !b.has(x)));
const setIntersection = (a, b) => new Set([...a].filter(x => b.has(x)));
const setUnion = (a, b) => new Set([...a, ...b]);

Usage:

用法:

const a = new Set([1, 2]);
const b = new Set([2, 3]);

setDifference(a, b); // Set { 1 }
setIntersection(a, b); // Set { 2 }
setUnion(a, b); // Set { 1, 2, 3 }

#1


137  

if don't know if this is most effective, but perhaps the shortest

如果不知道这是否是最有效的,也许是最短的

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(function(x) { return B.indexOf(x) < 0 })

console.log(diff);

Updated to ES6:

ES6更新:

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(x => B.indexOf(x) < 0 );

console.log(diff);

#2


41  

Well, 7 years later, with ES6's Set object it's quite easy (but still not as compact as pythons A - B), and reportedly faster than indexOf for large arrays:

7年后,ES6的Set对象非常简单(但仍然没有python A - B那么紧凑),据说比大型数组的indexOf要快:

let a = new Set([1,2,3,4]);
let b = new Set([5,4,3,2]);

console.log(new Set([...a].filter(x => !b.has(x)))); //a\b => {1}
console.log(new Set([...b].filter(x => !a.has(x)))); //b\a => {5}
console.log(new Set([...a].filter(x => b.has(x))));  //a∩b => {2,3,4}

#3


16  

You can use an object as a map to avoid linearly scanning B for each element of A as in user187291's answer:

您可以使用对象作为映射来避免对a的每个元素进行线性扫描B,如user187291的回答:

function setMinus(A, B) {
    var map = {}, C = [];

    for(var i = B.length; i--; )
        map[B[i].toSource()] = null; // any other value would do

    for(var i = A.length; i--; ) {
        if(!map.hasOwnProperty(A[i].toSource()))
            C.push(A[i]);
    }

    return C;
}

The non-standard toSource() method is used to get unique property names; if all elements already have unique string representations (as is the case with numbers), you can speed up the code by dropping the toSource() invocations.

非标准的toSource()方法用于获取唯一的属性名;如果所有元素都有唯一的字符串表示(就像数字一样),那么可以通过删除toSource()调用来加快代码的速度。

#4


9  

The shortest, using jQuery, is:

最短的,使用jQuery,是:

var A = [1, 2, 3, 4];
var B = [1, 3, 4, 7];

var diff = $(A).not(B);

console.log(diff.toArray());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

#5


5  

I would hash the array B, then keep values from the array A not present in B:

我将数组B哈希,然后保留数组A中不存在的值:

function getHash(array){
  // Hash an array into a set of properties
  //
  // params:
  //   array - (array) (!nil) the array to hash
  //
  // return: (object)
  //   hash object with one property set to true for each value in the array

  var hash = {};
  for (var i=0; i<array.length; i++){
    hash[ array[i] ] = true;
  }
  return hash;
}

function getDifference(a, b){
  // compute the difference a\b
  //
  // params:
  //   a - (array) (!nil) first array as a set of values (no duplicates)
  //   b - (array) (!nil) second array as a set of values (no duplicates)
  //
  // return: (array)
  //   the set of values (no duplicates) in array a and not in b, 
  //   listed in the same order as in array a.

  var hash = getHash(b);
  var diff = [];
  for (var i=0; i<a.length; i++){
    var value = a[i];
    if ( !hash[value]){
      diff.push(value);
    }
  }
  return diff;
}

#6


4  

Incorporating the idea from Christoph and assuming a couple of non-standard iteration methods on arrays and objects/hashes (each and friends), we can get set difference, union and intersection in linear time in about 20 lines total:

结合Christoph的想法,假设在数组和对象/散列(每个和朋友)上有一些非标准的迭代方法,我们可以在大约20行中得到设置差、合并和相交的线性时间:

var setOPs = {
  minusAB : function (a, b) {
    var h = {};
    b.each(function (v) { h[v] = true; });
    return a.filter(function (v) { return !h.hasOwnProperty(v); });
  },
  unionAB : function (a, b) {
    var h = {}, f = function (v) { h[v] = true; };
    a.each(f);
    b.each(f);
    return myUtils.keys(h);
  },
  intersectAB : function (a, b) {
    var h = {};
    a.each(function (v) { h[v] = 1; });
    b.each(function (v) { h[v] = (h[v] || 0) + 1; });
    var fnSel = function (v, count) { return count > 1; };
    var fnVal = function (v, c) { return v; };
    return myUtils.select(h, fnSel, fnVal);
  }
};

This assumes that each and filter are defined for arrays, and that we have two utility methods:

假设每个和过滤器都是为数组定义的,并且我们有两个实用方法:

  • myUtils.keys(hash): returns an array with the keys of the hash

    myUtils.keys(hash):返回一个具有散列的键的数组

  • myUtils.select(hash, fnSelector, fnEvaluator): returns an array with the results of calling fnEvaluator on the key/value pairs for which fnSelector returns true.

    myUtils。select(hash, fnSelector, fnEvaluator):返回一个数组,其结果是在fnSelector返回的键/值对上调用fnEvaluator。

The select() is loosely inspired by Common Lisp, and is merely filter() and map() rolled into one. (It would be better to have them defined on Object.prototype, but doing so wrecks havoc with jQuery, so I settled for static utility methods.)

select()受到了Common Lisp的松散启发,只是将filter()和map()合并为一个。(最好把它们定义在对象上。原型,但这样做破坏了jQuery,所以我选择了静态实用工具方法。

Performance: Testing with

性能:测试

var a = [], b = [];
for (var i = 100000; i--; ) {
  if (i % 2 !== 0) a.push(i);
  if (i % 3 !== 0) b.push(i);
}

gives two sets with 50,000 and 66,666 elements. With these values A-B takes about 75ms, while union and intersection are about 150ms each. (Mac Safari 4.0, using Javascript Date for timing.)

给出了包含50,000和66,666个元素的两个集合。A-B的值约为75ms,而union和crossroads的值约为150ms。(Mac Safari 4.0,使用Javascript日期计时)

I think that's decent payoff for 20 lines of code.

我认为这对20行代码来说是个不错的回报。

#7


3  

Using Underscore.js (Library for functional JS)

使用下划线。js(函数js库)

>>> var foo = [1,2,3]
>>> var bar = [1,2,4]
>>> _.difference(foo, bar);
[4]

#8


1  

This works, but I think another one is much more shorter, and elegant too

这是可行的,但我认为另一个更短,也更优雅

A = [1, 'a', 'b', 12];
B = ['a', 3, 4, 'b'];

diff_set = {
    ar : {},
    diff : Array(),
    remove_set : function(a) { ar = a; return this; },
    remove: function (el) {
        if(ar.indexOf(el)<0) this.diff.push(el);
    }
}

A.forEach(diff_set.remove_set(B).remove,diff_set);
C = diff_set.diff;

#9


1  

As for the fasted way, this isn't so elegant but I've run some tests to be sure. Loading one array as an object is far faster to process in large quantities:

至于固定方式,这并不是那么优雅,但我已经运行了一些测试来确定。将一个数组作为对象加载,处理大量数据要快得多:

var t, a, b, c, A;

    // Fill some arrays to compare
a = Array(30000).fill(0).map(function(v,i) {
    return i.toFixed();
});
b = Array(20000).fill(0).map(function(v,i) {
    return (i*2).toFixed();
});

    // Simple indexOf inside filter
t = Date.now();
c = b.filter(function(v) { return a.indexOf(v) < 0; });
console.log('completed indexOf in %j ms with result %j length', Date.now() - t, c.length);

    // Load `a` as Object `A` first to avoid indexOf in filter
t = Date.now();
A = {};
a.forEach(function(v) { A[v] = true; });
c = b.filter(function(v) { return !a[v]; });
console.log('completed Object in %j ms with result %j length', Date.now() - t, c.length);

Results:

结果:

completed indexOf in 1219 ms with result 5000 length
completed Object in 8 ms with result 5000 length

However, this works with strings only. If you plan to compare numbered sets you'll want to map results with parseInt.

但是,这只适用于字符串。如果您计划比较编号集,您将希望使用parseInt映射结果。

#10


1  

You can use this lightweight array-diff open source component.

您可以使用这个轻量级的array-diff开源组件。

Example:

例子:

diff([1,2,3], [1,2,3,4,5]) // => [4,5]

It works by concating the two arrays passed and filtering included vals, returning an array representing the difference between the two arrays:

它通过对所传递的两个数组进行处理,并过滤其中包含的vals,返回一个表示两个数组之间差异的数组:

function diff(firstArray: any[], secondArray: any[]): any[] {
  return firstArray.concat(secondArray).filter((val) => {
    return !(firstArray.includes(val) && secondArray.includes(val));
  });
};

#11


1  

Some simple functions, borrowing from @milan's answer:

一些简单的函数,借用@milan的回答:

const setDifference = (a, b) => new Set([...a].filter(x => !b.has(x)));
const setIntersection = (a, b) => new Set([...a].filter(x => b.has(x)));
const setUnion = (a, b) => new Set([...a, ...b]);

Usage:

用法:

const a = new Set([1, 2]);
const b = new Set([2, 3]);

setDifference(a, b); // Set { 1 }
setIntersection(a, b); // Set { 2 }
setUnion(a, b); // Set { 1, 2, 3 }