在一定范围内从int列表中生成所有可能的组合

I need to do this in Python. There is a given list l,may contain more than 5000 integer elements. There is a limit on sum of the numbers,20000 or may be high. The output should be all the possible sums of 2 numbers picked from list, Like,

我需要用Python做这个。有一个给定的列表l，可能包含5000多个整数元素。这些数字的总和是有限制的，20000或者可能很高。输出应该是从列表中选出的两个数字的所有可能的和，比如，

l=[1,2,3,4,5,6,7,8,9]
output 
1+1,1+2,1+3,1+4,1+5,1+6...........
2+2,2+3,2+4.......
.........
.......

2,3,4,5,6... like that

I'm using this code,Doing this for now, But It's slow

我现在用的是这段代码，但是很慢

l=listgen()
p=[]
for i in range(0,len(l)):
    for j in range(i,len(l)):
        k=l[i]+l[j]
        if k not in p:
            p.append(k)
p.sort
print(p)

listgen() is the function that generating the input list.

listgen()是生成输入列表的函数。

6 个解决方案

#1

Some old-fashioned optimization might get you faster code that's easier to grok than list comprehensions with multiple for loops:

一些老式的优化可能会让您更快地编写代码，比使用多个for循环的列表理解更容易理解:

def sums(lst, limit):    # prevent global lookups by using a function
    res = set()          # set membership testing is much faster than lists
    res_add = res.add    # cache add method
    for i, first in enumerate(lst):   # get index and item at the same time
        for second in lst[i:]:        # one copy operation saves n index ops.
            res_add(first + second)   # prevent creation/lookup of extra local temporary
    return sorted([x for x in res if x < limit])

print sums(listgen(), 20000)

as an added bonus, this version will optimize beautifully with psyco, cython, etc.

作为额外的奖励，这个版本将优化与psyco, cython等。

Update: When comparing this to the other suggestions (replacing listgen with range(5000), I get:

更新:与其他建议(用range(5000)替换listgen时，我得到:

mine:        1.30 secs
WolframH:    2.65 secs
lazyr:       1.54 secs (estimate based on OPs timings -- I don't have Python 2.7 handy)

#2

EDIT: Thebjorn says he has the most efficient solution, and my own tests agree, though I've improved my performance a little. His code is also less dependent on python version and seems to be very well thought out and explained with regards to optimalization. You should accept his answer (and give him upvotes).

编辑:Thebjorn说他有最有效的解决方案，我自己的测试也同意，尽管我的表现有所提高。他的代码也不太依赖于python版本，并且似乎对优化进行了很好的考虑和解释。你应该接受他的回答(并给他支持)。

Use itertools.combinations_with_replacement (added in python 2.7), and make p a set.

使用itertools.combinations_with_replace(在python 2.7中添加)，并将p设置为一个集合。

def sums(lst, limit):
    from itertools import combinations_with_replacement
    p = set(x + y for x, y in combinations_with_replacement(listgen(), 2))
    return sorted([x for x in p if x < limit])

Your code is slow because of this line:

你的代码因为这一行而变慢:

if k not in p: # O(N) lookup time in lists vs average case O(1) in sets

If you just make a couple of small changes to your code so that p is a set, it would make a huge difference:

如果你只是对你的代码做一些小小的修改，使p是一个集合，它将会产生巨大的不同:

L = listgen()
p = set()
for i in range(0, len(L)):
    for j in range(i, len(L)):
        p.add(L[i] + L[j])
print(sorted(p))

By the way, this line in your example

顺便说一下，在你的例子中

p.sort

has no effect. You must call a method to actually execute it, like so:

没有效果。您必须调用一个方法来实际执行它，如下所示:

p.sort()

#3

Edit: Included the limit (which was not in the OP's code).

编辑:包含限制(不在OP的代码中)。

a = set(x + y for x in l for y in l)
print(sorted(x for x in a if x < limit))

That also reduces the complexity of the algorithm (yours is potentially O(n^4) because of the membership testing in a list).

这也降低了算法的复杂性(你可能是O(n ^ 4)因为会员测试列表中)。

#4

If the input list is sorted, you can break out of the inner loop when you reach the limit. Also, make p a set.

如果输入列表已排序，当达到极限时，可以跳出内部循环。另外，把p设成一个集合。

lst=listgen()
lst.sort()
p=set()
for i in range(0,len(lst)):
    for j in range(i,len(lst)):
        k=lst[i]+lst[j]
        if k > limit:
            break
        p.add(k)
p = sorted(p)
print(p)

#5

You could use "NumPy" for this. This gives you definetly the required performance:

你可以用“NumPy”。这无疑给了你所需的性能:

import numpy as np

data = np.arange(5000)
limit = 20000
result = np.zeros(0,dtype='i4')
for i in data:
    result = np.concatenate((result,data[i]+data[i:]))
    if len(result) >= limit: break
result = result[:limit]

EDIT: I just realized that the limit is on the sum and not on the number of elements. Then the code should read:

编辑:我刚刚意识到极限是在和上，而不是元素的数量上。那么代码应该是:

EDIT2: Found further logical errors. My corrected suggestion is:

发现进一步的逻辑错误。我的修正的建议是:

for idx, x in np.ndenumerate(data):
    result = np.concatenate((result,x+data[idx[0]:]))
    if x + data[-1] >= limit: break
result = result[result <= limit]

#6

If the list can contain repeated elements it might be a wise idea to get rid of them first, e.g. by converting the list to a set.

如果列表可以包含重复元素，那么最好先去掉它们，例如将列表转换为一个集合。

#1