Problem

Say you have n lists of integers, where each list only includes integers in the range from 1 to n. For example, for n = 4, we might have:

假设有n个整数列表，其中每个列表只包含从1到n的整数。

a_1 = [1, 2]
a_2 = [3]
a_3 = [4, 1, 1]
a_4 = [2, 3]

Now my question is: Can I tick off all the integers between 1 and n in those n lists but with the catch that once I find one number, I can't use that list anymore to find subsequent numbers?

现在我的问题是:我能勾掉n个列表中1到n之间的所有整数吗?但是有一个问题，一旦我找到一个数字，我就不能再用这个列表来找到后面的数字了?

For example, in the above example with n = 4, I can choose 1 from a_1, 2 from a_4, 3 from a_2, and 4 for a_3, and I have therefore filled all the numbers from 1 to 4 but only using each list once.

例如，在上面n = 4的例子中，我可以从a_1中选择1，从a_4中选择2，从a_2中选择3，从a_3中选择4，因此我将所有的数字从1填充到4，但只使用每个列表一次。

An example where I can't find the range (and hence should return False) would be:

一个我找不到范围(因此应该返回False)的例子是:

a_1 = [1, 2]
a_2 = [3, 3, 5]
a_3 = [4]
a_4 = [5]
a_5 = [3, 4, 5]

The reason is because if I choose 1 from a_1, I can't choose 2 anymore from any list.

原因是如果我从a_1中选择1，我不能从任何列表中选择2。

Approach

This is my current straightforward approach. I make a cartesian product of the lists and check if there is any that, sorted, will be a range.

这是我目前的直接方法。我做了一个笛卡尔积的列表，检查是否有，排序，将是一个范围。

import itertools

def fillRange(lists):
  cartesian = itertools.product(*lists)
  return any([sorted(comb) == range(1, len(lists) + 1) for comb in cartesian])

Question

Although my approach works, for large lists it becomes a bit inefficient. Any thoughts on how I can improve this algorithm?

虽然我的方法是有效的，但是对于大的列表，它变得有点低效。对于如何改进这个算法有什么想法吗?

Thanks!

谢谢!

5 个解决方案

#1

You can formulate this as a maximum flow problem in a bipartite graph where the left nodes correspond to lists, and the right nodes correspond to integers 1 to n.

你可以把它表示成一个二部图中的最大流量问题，左边的节点对应列表，右边的节点对应1到n的整数。

There is an edge in the graph iff the integer is in the corresponding list.

图中有一条边，这个整数在相应的列表中。

All capacities in the graph are equal to 1.

图中的所有容量都等于1。

If you can find a flow of size n from the left side to the right side then the problem is soluble.

如果你能找到一个大小为n的流从左边到右边，那么问题是可以解决的。

Python code to do this below:

Python代码实现以下目的:

import networkx as nx

a_1 = [1, 2]
a_2 = [2]
a_3 = [4, 1, 1]
a_4 = [2, 3]
A = [a_1,a_2,a_3,a_4]
n = 4

G=nx.DiGraph()
for i,a in enumerate(A):
    for j in set(a):
        l = 'list'+str(i)
        G.add_edge(l,j,capacity=1)
        G.add_edge('start',l,capacity=1)
for j in range(1,n+1):
    G.add_edge(j,'dest',capacity=1)
v,flow = nx.maximum_flow(G,'start','dest')
if v<n:
    print 'Impossible'
else:
    for i,a in enumerate(A):
        for j in set(a):
            if flow['list'+str(i)][j]>0:
                print 'Use',j,'from list',a

This prints:

这个打印:

Use 1 from list [1, 2]
Use 2 from list [2]
Use 4 from list [4, 1, 1]
Use 3 from list [2, 3]

#2

Instead of testing all the combinations in order, you can speed this up a lot by testing the most-constrained lists first, and also updating the alternatives in the other lists as you add elements to your solution set. This way, you can "solve" both your examples without backtracking once.

不按顺序测试所有组合，您可以先测试最受约束的列表，并在向解决方案集添加元素时更新其他列表中的备选方案，从而大大加快速度。

def search(n, lists):
    if n == 0:
        yield []
    else:
        lists = [l for l in lists if l != []]
        if len(lists) >= n:
            least = min(lists, key=len)
            for val in least:
                new = [[x for x in lst if x != val] for lst in lists if lst is not least]
                for res in search(n-1, new):
                    yield [val] + res

Here's some debugging/tracing output for your two examples to help with the understanding. First value is n, then the lists, and finally the previously chosen val.

下面是您的两个示例的一些调试/跟踪输出，以帮助您理解。第一个值是n，然后是列表，最后是先前选择的val。

4 [[1, 2], [3], [4, 1, 1], [2, 3]] None
3 [[1, 2], [4, 1, 1], [2]] 3
2 [[1], [4, 1, 1]] 2
1 [[4]] 1
0 [] 4 --> solution
[3, 2, 1, 4]

5 [[1, 2], [3, 3, 5], [4], [5], [3, 4, 5]] None
4 [[1, 2], [3, 3, 5], [5], [3, 5]] 4
3 [[1, 2], [3, 3], [3]] 5
2 [[1, 2], []] 3 --> abort

If you also want the indices of the lists the elements have been taken from, the code gets a little more complicated, but not much:

如果您还想要元素从列表中获取的索引，那么代码会变得稍微复杂一点，但不会太复杂:

def search(n, lists):
    if n == 0:
        yield []
    else:
        if sum(1 for l in lists if l) >= n:
            i = min(range(len(lists)), key=lambda x: (lists[x] == [], len(lists[x])))
            for val in lists[i]:
                new = [[x for x in lst if x != val] if lst is not lists[i] else [] for lst in lists]
                for res in search(n-1, new):
                    yield [(i, val)] + res

Result for your first example then is [(1, 3), (3, 2), (0, 1), (2, 4)]

第一个例子的结果是[(1,3)，(3,2)，(0,1)，(2,4)]

#3

The cartesian product seems the most straightforward to me. I would do the following to streamline your code:

笛卡尔积对我来说是最直接的。为了简化您的代码，我将做以下工作:

remove []'s from your any expression as I mentioned in the comments

如我在评论中提到的那样，从你的任何表达中删除[]。
collapse all input lists to sets before computing cartesian product - there is no point in processing duplicate values from the same list

在计算笛卡尔积之前，将所有输入列表折叠成集合——处理来自相同列表的重复值是没有意义的
save range(1, len(lists)+1) to a local variable and compare with that instead of recreating the range each time (this is a common optimization technique called "invariant lifting", in which a computed expression that doesn't change during the loop is "lifted" out of the loop and just computed once)

将range(1, len(list)+1)保存到一个局部变量中，并与之进行比较，而不是每次都重新创建范围(这是一种称为“不变提升”的常见优化技术，在这种技术中，在循环过程中不改变的计算表达式被从循环中“提升”出来，只计算一次)

But ultimately, the basic algorithm of computing a cartesian of your input lists, and then looking for any which are the values 1-n is still as you originally wrote.

但最终，计算输入列表笛卡尔坐标的基本算法，然后寻找值为1-n的值仍然是你最初写的。

def fillRange(lists):
  cartesian = itertools.product(*(set(x) for x in lists))
  target = list(range(1, len(lists) + 1))
  return any(sorted(comb) == target for comb in cartesian)

#4

This can be seen as a problem of matching in a bipartite graph. As it turns out, Hall's marriage theorem tells you the answer (that is, whether the matching exists, not the matching itself). Here is a possible implementation (using NumPy for convenience):

这可以看作是二部图中的匹配问题。事实证明，霍尔的婚姻定理告诉你答案(即，匹配是否存在，而不是匹配本身)。这里有一个可能的实现(为了方便使用NumPy):

from itertools import chain, combinations
import numpy as np

# Recipe from itertools docs: https://docs.python.org/3/library/itertools.html#itertools-recipes
def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))


def has_matching(lists):
    n = len(lists)
    m = np.array([np.logical_or.reduce(np.arange(1, n + 1)[:, np.newaxis] == lst, axis=1)
                  for lst in lists])
    m = m.astype(int)
    for s in powerset(range(n)):
        if np.sum(np.logical_or.reduce(m[s, :], axis=0)) < len(s):
            return False
    return True


lists1 = [[1, 2],
          [3],
          [4, 1, 1],
          [2, 3]]
print(has_matching(lists1))
>>> True

lists2 = [[1, 2],
          [3, 3, 5],
          [4],
          [5],
          [3, 4, 5]]
print(has_matching(lists2))
>>> False

However, this requires you to go through every subset of {1, ..., n}, so I guess the algorithm is O(2^N). Not great, but maybe better than going through the whole Cartesian product, which I guess would be O(N^N).

但是，这要求您遍历{1，…我猜算法是O(2N)不是很好，但可能比整个笛卡尔积更好，我猜应该是O(NN)

#5

You could try mapping which values occur in which list, and decompose your problem from there. This code builds that sort of reverse-lookup:

您可以尝试映射出现在哪个列表中的值，然后从那里分解问题。这段代码构建了这种反向查找:

In[38]: from collections import defaultdict
In[39]: occurrences = defaultdict(set)
In[40]: for i,ll in enumerate(lists):
   ...:     for x in ll:
   ...:         occurrences[x].add(i)
   ...:         
In[41]: print(occurrences)
defaultdict(<class 'set'>, {1: {0, 2}, 2: {0, 3}, 3: {1, 3}, 4: {2}})
In[42]: print(dict(occurrences.items()))
{1: {0, 2}, 2: {0, 3}, 3: {1, 3}, 4: {2}}

For instance, you can see at a glance, that 4 exists only in list[2] (that is a_3 in your original question). From there, if you eliminate 2 from the other values, 1 exists only in list[0]. Eliminating 0 shows that 2 only exists in list[3], and 3 then can only be gotten from list[1]. If in doing this successive elimination, any of the sets to select from becomes empty, then there is no solution.

例如，您可以一眼看到，4只存在于清单[2]中(在您最初的问题中是a_3)。从这里开始，如果从其他值中除去2，则1只存在于清单[0]中。消去0表示2只存在于list[3]中，3则只能从list[1]中获得。如果在进行这个连续的消去时，要选择的任何集合都变成空的，那么就没有解决方案。

#1