从一个数字列表中生成所有的唯一对,n选2。

时间:2021-11-03 19:26:02

i have a list of elements (let's say integers), and i need to make all possible 2-pair comparisons. my approach is O(n^2), and i am wondering if there is a faster way. here is my implementation in java.

我有一个元素列表(假设是整数),我需要做所有可能的2对比较。我的方法是O(n ^ 2),和我想知道如果有一个更快的方式。这是我在java中的实现。

public class Pair {
 public int x, y;
 public Pair(int x, int y) {
  this.x = x;
  this.y = y;
 }
}

public List<Pair> getAllPairs(List<Integer> numbers) {
 List<Pair> pairs = new ArrayList<Pair>();
 int total = numbers.size();
 for(int i=0; i < total; i++) {
  int num1 = numbers.get(i).intValue();
  for(int j=i+1; j < total; j++) {
   int num2 = numbers.get(j).intValue();
   pairs.add(new Pair(num1,num2));
  }
 }
 return pairs;
}

please note that i don't allow self-pairing, so there should be ((n(n+1))/2) - n possible pairs. what i have currently works, but as n increases, it is taking me an unbearable long amount of time to get the pairs. is there any way to turn the O(n^2) algorithm above to something sub-quadratic? any help is appreciated.

请注意,我不允许自配对,所以应该有(n(n+1))/2) - n可能对。我现在所做的工作,但是随着n的增加,我花费了很长一段时间来得到这些配对。有什么办法可以把上面的O(n ^ 2)算法来sub-quadratic吗?任何帮助都是感激。

by the way, i also tried the algorithm below, but when i benchmark, empirically, it performs worst than what i had above. i had thought that by avoiding an inner loop this would speed things up. shouldn't this algorithm below be faster? i would think that it's O(n)? if not, please explain and let me know. thanks.

顺便说一下,我也尝试了下面的算法,但是当我以经验为基准测试时,它的表现比我上面的要差。我曾经想过,通过避免内部循环,这将加速事情的发展。下面的算法不应该更快吗?我认为是O(n)如果没有,请解释并告诉我。谢谢。

public List<Pair> getAllPairs(List<Integer> numbers) {
 int n = list.size();
 int i = 0;
 int j = i + 1;
 while(true) {
  int num1 = list.get(i);
  int num2 = list.get(j);
  pairs.add(new Pair(num1,num2));

  j++;

  if(j >= n) {
   i++;
   j = i + 1;
  }

  if(i >= n - 1) {
   break;
  }
 }
}

2 个解决方案

#1


4  

You cannot make it sub-quadric, because as you said - the output is itself quadric - and to create it, you need at least #elements_in_output ops.

你不能使它为次二次曲面,因为正如你所说的——输出本身就是二次曲面——并且要创建它,你至少需要#elements_in_output操作。

However, you could do some "cheating" create your list on the fly:
You can create a class CombinationsGetter that implements Iterable<Pair>, and implement its Iterator<Pair>. This way, you will be able to iterate on the elements on the fly, without creating the list first, which might decrease latency for your application.

但是,您可以执行一些“欺骗”创建您的列表:您可以创建一个类组合sgetter来实现Iterable ,并实现其迭代器 。通过这种方式,您将能够在不首先创建列表的情况下,对动态的元素进行迭代,这可能会减少应用程序的延迟。

Note: It will still be quadric! The time to generate the list on the fly will just be distributed between more operations.

注意:它仍然是二次曲面!在飞行中生成列表的时间将在更多的操作之间分配。


EDIT: Another solution, which is faster then the naive approach - is multithreading.
Create a few threads, each will get his "slice" of the data - and generate relevant pairs, and create its own partial list.
Later - you can use ArrayList.addAll() to convert those different lists into one.
Note: though complexity is stiil O(n^2), it is likely to be much faster - since the creation of pairs is done in parallel, and ArrayList.addAll() is implemented much more effieciently then the trivial insert one by one elements.

编辑:另一种解决方案,它比单纯的方法更快,是多线程的。创建一些线程,每个线程将获得数据的“切片”,并生成相关的对,并创建自己的部分列表。稍后,您可以使用ArrayList.addAll()将这些不同的列表转换为一个列表。注意:尽管复杂度是stiil O(n 2),但它可能要快得多——因为成对的创建是并行完成的,而ArrayList.addAll()则更容易实现,然后由一个元素插入一个简单的插入。

EDIT2: Your second code is still O(n^2), even though it is a "single loop" - the loop itself will repeat O(n^2) times. Have a look at your variable i. It increases only when j==n, and it decreases j back to i+1 when it does it. So, it will result in n + (n-1) + ... + 1 iterations, and this is sum of arithmetic progression, and gets us back to O(n^2) as expected.

EDIT2:第二个代码仍然是O(n ^ 2),尽管这是一个“单回路”——循环本身将重复O(n ^ 2)次。看看你的变量i,它只在j==n时增加,当它这么做时,它会把j减小到i+1。结果是n + (n-1) +…+ 1迭代,这是等差数列,并让我们回O(n ^ 2)。

We cannot get better then O(n^2), because we are trying to create O(n^2) distinct Pair objects.

我们不能得到更好的O(n ^ 2),因为我们正在努力创建O(n ^ 2)不同的对象。

#2


6  

Well, you can't, right?

你不能,对吧?

The result is a list with n*(n-1)/2 elements, no matter what those elements are, just to populate this list (say with zeros) takes O(n^2) time, since n*(n-1)/2 = O(n^2)...

结果是一个包含n*(n-1)/2元素的列表,不管这些元素是什么,只要填入这个列表(用0表示)就可以得到O(n 2)时间,因为n*(n-1)/2 = O(n 2)…

#1


4  

You cannot make it sub-quadric, because as you said - the output is itself quadric - and to create it, you need at least #elements_in_output ops.

你不能使它为次二次曲面,因为正如你所说的——输出本身就是二次曲面——并且要创建它,你至少需要#elements_in_output操作。

However, you could do some "cheating" create your list on the fly:
You can create a class CombinationsGetter that implements Iterable<Pair>, and implement its Iterator<Pair>. This way, you will be able to iterate on the elements on the fly, without creating the list first, which might decrease latency for your application.

但是,您可以执行一些“欺骗”创建您的列表:您可以创建一个类组合sgetter来实现Iterable ,并实现其迭代器 。通过这种方式,您将能够在不首先创建列表的情况下,对动态的元素进行迭代,这可能会减少应用程序的延迟。

Note: It will still be quadric! The time to generate the list on the fly will just be distributed between more operations.

注意:它仍然是二次曲面!在飞行中生成列表的时间将在更多的操作之间分配。


EDIT: Another solution, which is faster then the naive approach - is multithreading.
Create a few threads, each will get his "slice" of the data - and generate relevant pairs, and create its own partial list.
Later - you can use ArrayList.addAll() to convert those different lists into one.
Note: though complexity is stiil O(n^2), it is likely to be much faster - since the creation of pairs is done in parallel, and ArrayList.addAll() is implemented much more effieciently then the trivial insert one by one elements.

编辑:另一种解决方案,它比单纯的方法更快,是多线程的。创建一些线程,每个线程将获得数据的“切片”,并生成相关的对,并创建自己的部分列表。稍后,您可以使用ArrayList.addAll()将这些不同的列表转换为一个列表。注意:尽管复杂度是stiil O(n 2),但它可能要快得多——因为成对的创建是并行完成的,而ArrayList.addAll()则更容易实现,然后由一个元素插入一个简单的插入。

EDIT2: Your second code is still O(n^2), even though it is a "single loop" - the loop itself will repeat O(n^2) times. Have a look at your variable i. It increases only when j==n, and it decreases j back to i+1 when it does it. So, it will result in n + (n-1) + ... + 1 iterations, and this is sum of arithmetic progression, and gets us back to O(n^2) as expected.

EDIT2:第二个代码仍然是O(n ^ 2),尽管这是一个“单回路”——循环本身将重复O(n ^ 2)次。看看你的变量i,它只在j==n时增加,当它这么做时,它会把j减小到i+1。结果是n + (n-1) +…+ 1迭代,这是等差数列,并让我们回O(n ^ 2)。

We cannot get better then O(n^2), because we are trying to create O(n^2) distinct Pair objects.

我们不能得到更好的O(n ^ 2),因为我们正在努力创建O(n ^ 2)不同的对象。

#2


6  

Well, you can't, right?

你不能,对吧?

The result is a list with n*(n-1)/2 elements, no matter what those elements are, just to populate this list (say with zeros) takes O(n^2) time, since n*(n-1)/2 = O(n^2)...

结果是一个包含n*(n-1)/2元素的列表,不管这些元素是什么,只要填入这个列表(用0表示)就可以得到O(n 2)时间,因为n*(n-1)/2 = O(n 2)…