算法:从n个数组(队列)中找到k个数的最小和

Suppose there are n queues of positive numbers. I need to minimum sum of k numbers selected from these queues. Note that these are queues so ordering is important and only first number can be selected from any queue at a time. Once that number is selected and removed from queue we can proceed to the next one in that queue. So sorting is not allowed(order is important).

假设有n个正数队列。我需要从这些队列中选出k个数字的最小和。注意，这些是队列，所以排序很重要，每次只能从任何队列中选择第一个数字。一旦选择了这个号码并从队列中删除，我们就可以进入队列中的下一个队列。所以排序是不允许的(顺序很重要)。

For example:

例如:

Find minimum sum of two numbers

求两个数的最小和

2 12 3 4 8 2 210 10

In the above example I can either choose 2 from first queue and 8 from second or 8 and 2 both from second. Both choices give sum 10.

在上面的示例中，我可以从第一个队列中选择2，从第二个队列中选择8，也可以从第二个队列中选择8和2。两个选项都是10的和。

Example 2:

示例2:

Find minimum sum of two numbers

求两个数的最小和

4 15 28 2 210 10

In the above example one has to choose 8 and 2 both from second list.

在上面的示例中，必须从第二个列表中同时选择8和2。

I was first thinking on the line of merge K sorted lists but it won't work. I can only think of one working approach for this. It is to try all the combinations from all the queues. Can someone suggest a better way or guide me towards it?

我第一次想到的是归并K排序列表，但它行不通。我只能想到一个有效的方法。它是尝试所有队列中的所有组合。有没有人能给我一个更好的建议或者指导我?

2 个解决方案

#1

Let F(qs, k) be the minimum sum from choosing k numbers from the queues qs. Then:

让F(qs, k)是从队列中选择k个数字的最小值。然后:

F([], k) = 0 if k == 0, otherwise +infinity.F([q] + qs, k) = min_i(q[0] + q[1] + ... + q[i-1] + F(qs, k-i) for i in 0...k)

That is, if you've no queues left, the min sum is 0, otherwise, you can take i numbers from the first queue, and k-i from the rest.

也就是说，如果没有队列，最小和为0，否则，可以从第一个队列中取i号，从其余队列中取k-i号。

This can be solved efficiently using dynamic programming by building a table of (n, k) where n is the number of queues. In Python 2:

使用动态编程可以有效地解决这个问题，方法是构建一个(n, k)的表，其中n是队列的数量。在Python中2:

def F(qs, n, k, cache):    if k == 0:        return 0    if n == 0:        return 1e12    if (n, k) not in cache:        best = 1e12        s = 0        for i in xrange(k+1):            if i > len(qs[len(qs)-n]):                break            if i > 0:                s += qs[len(qs)-n][i-1]            best = min(best, s + F(qs, n-1, k-i, cache))        cache[n, k] = best    return cache[n, k]egs = [    (2, [[2, 2, 3, 4], [8, 2, 2], [10, 10]]),    (2, [[4, 15, 2], [8, 2, 2], [10, 10]]),    (3, [[100, 100, 100], [101, 101, 2]])]for k, qs in egs:    print k, qs    print F(qs, len(qs), k, dict())    print

Prints

打印

2 [[2, 2, 3, 4], [8, 2, 2], [10, 10]]42 [[4, 15, 2], [8, 2, 2], [10, 10]]103 [[100, 100, 100], [101, 101, 2]]204

#2

First try solving a simpler problem: How to find the smallest k elements from an array length m?

首先尝试解决一个更简单的问题:如何从数组长度m中找到最小的k元素?

Initialise a max-heap size k from the first k elements of the array (yes max-heap rather than min-heap). Loop over the rest of the array. At each step, compare the current element with the root of the heap (this is the kth smallest element seen so far). If the current element is smaller, remove the heap root and insert the current element, being careful to maintain the heap-invariant.

从数组的前k个元素初始化最大堆大小k(是的，最大堆而不是最小堆)。循环数组的其余部分。在每个步骤中，将当前元素与堆的根进行比较(这是到目前为止看到的第k个最小的元素)。如果当前元素比较小，则删除堆根元素并插入当前元素，小心地维护heap不变量。

When done, the heap contains the smallest k elements from the array. The algorithm has time complexity O(m log k) and space complexity O(k)

完成后，堆包含数组中最小的k个元素。该算法具有时间复杂度O(m log k)和空间复杂度O(k)

Implementation in Python. Python has only a min-heap module, so emulate a max-heap by taking the negative of everything.

在Python中实现。Python只有一个小堆模块，所以通过取所有的负数来模拟一个max-heap。

import heapq # min-heap operations def sum_smallest_k(queues, k):    """Sum the smallest k elements across queues"""    heap = [] # maintain a max-heap size k    for queue in queues:        for x in queue:            if len(heap) < k:                heapq.heappush(heap, -1 * x)            else:                heapq.heappushpop(heap, -1 * x)    return -1 * sum(heap)

Your examples

你的例子

>>> sum_smallest_k([[2, 12, 3, 4], [8, 2, 2], [10, 10]], 2)4>>> sum_smallest_k([[4, 15, 2], [8, 2, 2], [10, 10]], 2)4

#1