查找所有k-size的子集和n-size袋子的重复的未排序正整数的和

时间:2021-11-28 02:21:48

Please note that this is required for a C# .NET 2.0 project (Linq not allowed).

请注意,这是c# . net 2.0项目所需要的(Linq不允许)。

I know very similar questions have been asked here and I have already produce some working code (see below) but still would like an advice as to how to make the algorithm faster given k and s conditions.

我知道这里有很多类似的问题,我已经给出了一些工作代码(见下面),但是仍然想要一个关于如何让算法更快地给出k和s条件的建议。

This is what I've learnt so far: Dynamic programming is the most efficient way to finding ONE (not all) subsets. Please correct me if I am wrong. And is there a way of repeatedly calling the DP code to produce newer subsets till the bag (set with duplicates) is exhausted?

这是我到目前为止学到的:动态编程是找到一个(不是所有)子集的最有效方法。如果我说错了,请纠正我。是否有一种方法可以重复调用DP代码来生成新的子集,直到包(带有副本)耗尽为止?

If not, then is there a way that may speed up the backtracking recursive algorithm I have below which does produce what I need but runs in O(2^n), I think, by taking s and k into account?

如果不是,那么有没有一种方法,可能会加速回溯递归算法我有下面这生产我需要但在O(2 ^ n),我认为,考虑到年代和k ?

Here is my fixed bag of numbers that will NEVER change with n=114 and number range from 3 to 286:

这是我的固定的数字包,n=114不会改变,数字范围从3到286:

    int[] numbers = new int[]
    {
        7, 286, 200, 176, 120, 165, 206, 75, 129, 109,
        123, 111, 43, 52, 99, 128, 111, 110, 98, 135,
        112, 78, 118, 64, 77, 227, 93, 88, 69, 60,
        34, 30, 73, 54, 45, 83, 182, 88, 75, 85,
        54, 53, 89, 59, 37, 35, 38, 29, 18, 45,
        60, 49, 62, 55, 78, 96, 29, 22, 24, 13,
        14, 11, 11, 18, 12, 12, 30, 52, 52, 44,
        28, 28, 20, 56, 40, 31, 50, 40, 46, 42,
        29, 19, 36, 25, 22, 17, 19, 26, 30, 20,
        15, 21, 11, 8, 8, 19, 5, 8, 8, 11,
        11, 8, 3, 9, 5, 4, 7, 3, 6, 3,
        5, 4, 5, 6
    };

Requirements

需求

  • Space limit to 2-3GB max but time should be O(n^something) not (something^n).

    2-3GB max但是时间空间限制应该是O(n ^)(^ n)。

  • The bag must not be sorted and duplicate must not be removed.

    不能对包进行排序,不能删除重复的包。

  • The result should be the indices of the numbers in the matching subset, not the numbers themselves (as we have duplicates).

    结果应该是匹配子集中的数字的索引,而不是数字本身(因为我们有重复的数据)。

Dynamic Programming Attempt

动态规划的尝试

Here is the C# dynamic programming version adapted from an answer to a similar question here on *.com:

以下是c#动态编程版本,改编自*.com上一个类似问题的答案:

using System;
using System.Collections.Generic;

namespace Utilities
{
    public static class Combinations
    {
        private static Dictionary<int, bool> m_memo = new Dictionary<int, bool>();
        private static Dictionary<int, KeyValuePair<int, int>> m_previous = new Dictionary<int, KeyValuePair<int, int>>();
        static Combinations()
        {
            m_memo.Clear();
            m_previous.Clear();
            m_memo[0] = true;
            m_previous[0] = new KeyValuePair<int, int>(-1, 0);

        }

        public static bool FindSubset(IList<int> set, int sum)
        {
            //m_memo.Clear();
            //m_previous.Clear();
            //m_memo[0] = true;
            //m_previous[0] = new KeyValuePair<int, int>(-1, 0);

            for (int i = 0; i < set.Count; ++i)
            {
                int num = set[i];
                for (int s = sum; s >= num; --s)
                {
                    if (m_memo.ContainsKey(s - num) && m_memo[s - num] == true)
                    {
                        m_memo[s] = true;

                        if (!m_previous.ContainsKey(s))
                        {
                            m_previous[s] = new KeyValuePair<int, int>(i, num);
                        }
                    }
                }
            }

            return m_memo.ContainsKey(sum) && m_memo[sum];
        }
        public static IEnumerable<int> GetLastIndex(int sum)
        {
            while (m_previous[sum].Key != -1)
            {
                yield return m_previous[sum].Key;
                sum -= m_previous[sum].Value;
            }
        }

        public static void SubsetSumMain(string[] args)
        {
            int[] numbers = new int[]
        {
            7, 286, 200, 176, 120, 165, 206, 75, 129, 109,
            123, 111, 43, 52, 99, 128, 111, 110, 98, 135,
            112, 78, 118, 64, 77, 227, 93, 88, 69, 60,
            34, 30, 73, 54, 45, 83, 182, 88, 75, 85,
            54, 53, 89, 59, 37, 35, 38, 29, 18, 45,
            60, 49, 62, 55, 78, 96, 29, 22, 24, 13,
            14, 11, 11, 18, 12, 12, 30, 52, 52, 44,
            28, 28, 20, 56, 40, 31, 50, 40, 46, 42,
            29, 19, 36, 25, 22, 17, 19, 26, 30, 20,
            15, 21, 11, 8, 8, 19, 5, 8, 8, 11,
            11, 8, 3, 9, 5, 4, 7, 3, 6, 3,
            5, 4, 5, 6
        };

            int sum = 400;
            //int size = 4; // don't know to use in dynamic programming

            // call dynamic programming
            if (Numbers.FindSubset(numbers, sum))
            {
                foreach (int index in Numbers.GetLastIndex(sum))
                {
                    Console.Write((index + 1) + "." + numbers[index] + "\t");
                }
                Console.WriteLine();
            }
            Console.WriteLine();

            Console.ReadKey();
        }
    }
}

Recursive Programming Attempt

递归程序设计的尝试

and Here is the C# recursive programming version adapted from an answer to a similar question here on *.com:

这里是c#递归编程版本,改编自*.com上一个类似问题的答案:

using System;
using System.Collections.Generic;

namespace Utilities
{
    public static class Combinations
    {
        private static int s_count = 0;
        public static int CountSubsets(int[] numbers, int index, int current, int sum, int size, List<int> result)
        {
            if ((numbers.Length <= index) || (current > sum)) return 0;
            if (result == null) result = new List<int>();

            List<int> temp = new List<int>(result);
            if (current + numbers[index] == sum)
            {
                temp.Add(index);
                if ((size == 0) || (temp.Count == size))
                {
                    s_count++;
                }
            }
            else if (current + numbers[index] < sum)
            {
                temp.Add(index);
                CountSubsets(numbers, index + 1, current + numbers[index], sum, size, temp);
            }

            CountSubsets(numbers, index + 1, current, sum, size, result);
            return s_count;
        }

        private static List<List<int>> m_subsets = new List<List<int>>();
        public static List<List<int>> FindSubsets(int[] numbers, int index, int current, int sum, int size, List<int> result)
        {
            if ((numbers.Length <= index) || (current > sum)) return m_subsets;
            if (result == null) result = new List<int>();

            List<int> temp = new List<int>(result);
            if (current + numbers[index] == sum)
            {
                temp.Add(index);
                if ((size == 0) || (temp.Count == size))
                {
                    m_subsets.Add(temp);
                }
            }
            else if (current + numbers[index] < sum)
            {
                temp.Add(index);
                FindSubsets(numbers, index + 1, current + numbers[index], sum, size, temp);
            }

            FindSubsets(numbers, index + 1, current, sum, size, result);

            return m_subsets;
        }

        public static void SubsetSumMain(string[] args)
        {
            int[] numbers = new int[]
        {
            7, 286, 200, 176, 120, 165, 206, 75, 129, 109,
            123, 111, 43, 52, 99, 128, 111, 110, 98, 135,
            112, 78, 118, 64, 77, 227, 93, 88, 69, 60,
            34, 30, 73, 54, 45, 83, 182, 88, 75, 85,
            54, 53, 89, 59, 37, 35, 38, 29, 18, 45,
            60, 49, 62, 55, 78, 96, 29, 22, 24, 13,
            14, 11, 11, 18, 12, 12, 30, 52, 52, 44,
            28, 28, 20, 56, 40, 31, 50, 40, 46, 42,
            29, 19, 36, 25, 22, 17, 19, 26, 30, 20,
            15, 21, 11, 8, 8, 19, 5, 8, 8, 11,
            11, 8, 3, 9, 5, 4, 7, 3, 6, 3,
            5, 4, 5, 6
        };

            int sum = 17;
            int size = 2;

            // call backtracking recursive programming
            Console.WriteLine("CountSubsets");
            int count = Numbers.CountSubsets(numbers, 0, 0, sum, size, null);
            Console.WriteLine("Count = " + count);
            Console.WriteLine();

            // call backtracking recursive programming
            Console.WriteLine("FindSubsets");
            List<List<int>> subsets = Numbers.FindSubsets(numbers, 0, 0, sum, size, null);
            for (int i = 0; i < subsets.Count; i++)
            {
                if (subsets[i] != null)
                {
                    Console.Write((i + 1).ToString() + ":\t");
                    for (int j = 0; j < subsets[i].Count; j++)
                    {
                        int index = subsets[i][j];
                        Console.Write((index + 1) + "." + numbers[index] + " ");
                    }
                    Console.WriteLine();
                }
            }
            Console.WriteLine("Count = " + subsets.Count);

            Console.ReadKey();
        }
    }
}

Please let me know how to restrict the dynamic programming version to subsets of size k and if I can call it repeatedly so it returns different subsets on every call until there are no more matching subsets.

请让我知道如何将动态编程版本限制为大小为k的子集,如果我可以反复调用它,它会在每次调用时返回不同的子集,直到没有匹配的子集为止。

Also I am not sure where to initialize the memo of the DP algorithm. I did it in the static constructor which auto-runs when accessing any method. Is this the correct initialization place or does it need to be moved to inside the FindSunset() method [commented out]?

我也不确定在哪里初始化DP算法的备忘录。我是在静态构造函数中做的,它在访问任何方法时自动运行。这是正确的初始化位置,还是需要将其移动到FindSunset()方法[注释掉]中?

As for the recursive version, is it backtracking? and how can we speed it up. It works correctly and takes k and s into account but totally inefficient.

至于递归版本,是回溯吗?我们怎样才能加快速度呢?它可以正常工作,并且考虑到k和s,但效率非常低。

Let's make this thread the mother of all C# SubsetSum related questions!

让我们让这个线程成为所有c#子setsum相关问题的母亲!

3 个解决方案

#1


2  

My previous answer works on the principle of cutting off the number of combinations to check. But this can be improved significantly once you sort the array. The principle is similar, but since the solution is entirely different, I've decided to put it in a separate answer.

我之前的回答是关于减少组合数量的原则。但是,一旦对数组进行排序,这一点可以得到显著改善。原理是相似的,但是由于解决方案是完全不同的,我决定把它放在一个单独的答案中。

I was careful to use only .Net Framework 2.0 features. Might add a visual explanation later, but the comments should be enough.

我谨慎地只使用。net Framework 2.0特性。稍后可能会添加一个可视的解释,但是注释应该足够了。

class Puzzle
{
    private readonly int[] _tailSums;
    public readonly SubsetElement[] Elements;
    public readonly int N;

    public Puzzle(int[] numbers)
    {
        // Set N and make Elements array
        // (to remember the original index of each element)
        this.N = numbers.Length;
        this.Elements = new SubsetElement[this.N];

        for (var i = 0; i < this.N; i++)
        {
            this.Elements[i] = new SubsetElement(numbers[i], i);
        }

        // Sort Elements descendingly by their Number value
        Array.Sort(this.Elements, (a, b) => b.Number.CompareTo(a.Number));

        // Save tail-sums to allow immediate access by index
        // Allow immedate calculation by index = N, to sum 0
        this._tailSums = new int[this.N + 1];
        var sum = 0;

        for (var i = this.N - 1; i >= 0; i--)
        {
            this._tailSums[i] = sum += this.Elements[i].Number;
        }
    }

    public void Solve(int s, Action<SubsetElement[]> callback)
    {
        for (var k = 1; k <= this.N; k++)
            this.Solve(k, s, callback);
    }

    public void Solve(int k, int s, Action<SubsetElement[]> callback)
    {
        this.ScanSubsets(0, k, s, new List<SubsetElement>(), callback);
    }

    private void ScanSubsets(int startIndex, int k, int s,
                             List<SubsetElement> subset, Action<SubsetElement[]> cb)
    {
        // No more numbers to add, and current subset is guranteed to be valid
        if (k == 0)
        {
            // Callback with current subset
            cb(subset.ToArray());
            return;
        }

        // Sum the smallest k elements
        var minSubsetStartIndex = this.N - k;
        var minSum = this._tailSums[minSubssetStartIndex];

        // Smallest possible sum is greater than wanted sum,
        // so a valid subset cannot be found
        if (minSum > s)
        {
            return;
        }

        // Find largest number that satisfies the condition
        // that a valid subset can be found
        minSum -= this.Elements[minSubsetStartIndex].Number;

        // But remember the last index that satisfies the condition
        var minSubsetEndIndex = minSubsetStartIndex;

        while (minSubsetStartIndex > startIndex &&
               minSum + this.Elements[minSubsetStartIndex - 1].Number <= s)
        {
            minSubsetStartIndex--;
        }

        // Find the first number in the sorted sequence that is
        // the largest number we just found (in case of duplicates)
        while (minSubsetStartIndex > startIndex &&
               Elements[minSubsetStartIndex] == Elements[minSubsetStartIndex - 1])
        {
            minSubsetStartIndex--;
        }

        // [minSubsetStartIndex .. maxSubsetEndIndex] is the
        // full range we must check in recursion

        for (var subsetStartIndex = minSubsetStartIndex;
             subsetStartIndex <= minSubsetEndIndex;
             subsetStartIndex++)
        {
            // Find the largest possible sum, which is the sum of the
            // k first elements, starting at current subsetStartIndex
            var maxSum = this._tailSums[subsetStartIndex] -
                         this._tailSums[subsetStartIndex + k];

            // The largest possible sum is less than the wanted sum,
            // so a valid subset cannot be found
            if (maxSum < s)
            {
                return;
            }

            // Add current number to the subset
            var x = this.Elements[subsetStartIndex];
            subset.Add(x);

            // Recurse through the sub-problem to the right
            this.ScanSubsets(subsetStartIndex + 1, k - 1, s - x.Number, subset, cb);

            // Remove current number and continue loop
            subset.RemoveAt(subset.Count - 1);
        }
    }

    public sealed class SubsetElement
    {
        public readonly int Number;
        public readonly int Index;

        public SubsetElement(int number, int index)
        {
            this.Number = number;
            this.Index = index;
        }

        public override string ToString()
        {
            return string.Format("{0}({1})", this.Number, this.Index);
        }
    }
}

Usage and performance testing:

使用和性能测试:

private static void Main()
{
    var sw = Stopwatch.StartNew();
    var puzzle = new Puzzle(new[]
        {
            7, 286, 200, 176, 120, 165, 206, 75, 129, 109,
            123, 111, 43, 52, 99, 128, 111, 110, 98, 135,
            112, 78, 118, 64, 77, 227, 93, 88, 69, 60,
            34, 30, 73, 54, 45, 83, 182, 88, 75, 85,
            54, 53, 89, 59, 37, 35, 38, 29, 18, 45,
            60, 49, 62, 55, 78, 96, 29, 22, 24, 13,
            14, 11, 11, 18, 12, 12, 30, 52, 52, 44,
            28, 28, 20, 56, 40, 31, 50, 40, 46, 42,
            29, 19, 36, 25, 22, 17, 19, 26, 30, 20,
            15, 21, 11, 8, 8, 19, 5, 8, 8, 11,
            11, 8, 3, 9, 5, 4, 7, 3, 6, 3,
            5, 4, 5, 6
        });

    puzzle.Solve(2, 17, PuzzleOnSubsetFound);

    sw.Stop();
    Console.WriteLine("Subsets found: " + _subsetsCount);
    Console.WriteLine(sw.Elapsed);
}

private static int _subsetsCount;

private static void PuzzleOnSubsetFound(Puzzle.SubsetElement[] subset)
{
    _subsetsCount++;
    return; // Skip prints when speed-testing

    foreach (var el in subset)
    {
        Console.Write(el.ToString());
        Console.Write("  ");
    }

    Console.WriteLine();
}

Output:

输出:

Each line is a found subset, numbers in () are the original index of the number used in the subset

每一行都是一个被发现的子集,数字()是子集中使用的数字的原始索引。

14(60) 3(107)
14(60) 3(109)
14(60) 3(102)
13(59) 4(105)
13(59) 4(111)
12(64) 5(96)
12(64) 5(104)
12(64) 5(112)
12(64) 5(110)
12(65) 5(96)
12(65) 5(104)
12(65) 5(112)
12(65) 5(110)
11(100) 6(108)
11(100) 6(113)
11(61) 6(108)
11(61) 6(113)
11(92) 6(108)
11(92) 6(113)
11(62) 6(108)
11(62) 6(113)
11(99) 6(108)
11(99) 6(113)
9(103) 8(93)
9(103) 8(94)
9(103) 8(97)
9(103) 8(98)
9(103) 8(101)
Subsets found: 28
00:00:00.0017020 (measured when no printing is performed)

14(60)3(107)14(60)3(109)14(60)3(102)13(59)4(105)13(59)4(111)12(64)5(96)12(64)5(104)12(64)5(112)12(64)5(110)12(65)5(96)12(65)5(104)12(65)5(112)12(65)5(110)11(100)6(108)11(100)6(113)11(61)6(108)11(61)6(113)11(92)6(108)11(92)6(113)11(62)6(108)11(62)6(113)11(99)6(108)11(99)6(113)9(103)8(93)9(103)8(94)9(103)8(97)9(103)8(98)9(103)8(101)发现:子集28 00:00:00.0017020(当不执行打印)


The higher k is, the more cutoffs can be made. This is when you'll see the major performance difference. Your current code (the recursive version) also performs significantly slower when s is goes higher.

k越高,可以做的越少。这时您将看到主要的性能差异。当s值升高时,当前代码(递归版本)的执行速度也会显著降低。

With k=5, s=60
Your current code: found 45070 subsets in 2.8602923 seconds
My code: found 45070 subsets in 0.0116727 seconds
That is 99.6% speed improvement

当k=5, s=60时,你当前的代码:在2.8602923秒内找到45070个子集我的代码:在0.0116727秒内找到45070个子集,速度提高了99.6%

#2


0  

Simply search for all combinations of size K, and check each if it satisfies the condition.

只需搜索大小为K的所有组合,并检查每个组合是否满足条件。

The fastest algorithm for k combinations, suiting your case, would be:

最适合你的k个组合的最快算法是:

for (var i1 = 0; i1 <= n; i1++)
{
    for (var i2 = i1 + 1; i2 <= n; i2++)
    {
        for (var i3 = i2 + 1; i3 <= n; i3++)
        {
            ...

            for (var ik = iOneBeforeK + 1; ik <= n; ik++)
            {
                if (arr[i1] + arr[i2] + ... + arr[ik] == sum)
                {
                    // this is a valid subset
                }
            }
        }
    }
}

But you are talking about numbers and summing them up, meaning you could make cutoffs with a smarter algorithm.

但你说的是数字并把它们加起来,这意味着你可以用更聪明的算法来进行截断。

Since all numbers are positive, you know that if a single number is big enough, you cannot add to it any more positive numbers and have it sum up to s. Given s=6 and k=4, the highest number to include in the search is s-k+1=3. 3+1+1+1 is k numbers, 1 being your lowest possible number, sums up to 6. Any number above 3 couldn't have 3 other positives added to it, and have the sum <= 6.

因为所有的数都是正数,所以你知道如果一个数足够大,你就不能再给它加任何正数,让它和s相加。给定s=6和k=4,搜索中包含的最高的数是s-k+1=3。3+1+1+1是k个数,1是最小的数,加起来是6。3以上的任何数都不能再增加3个正数,并且总和<= 6。

But wait, your minimum possible value isn't 1, it is 3. That's even better. So assume k=10, n=60, min=3. The "highest number scenario" is x+min(k-1)=n -> x=60-3*9=33. So the highest number to even consider would be 33.

但是等一下,你的最小值不是1,而是3。这是更好的。假设k=10 n=60 min=3。“最高数字场景”是x+min(k-1)=n -> x=60-3*9=33。所以最高的数字是33。

This cuts the amount of relevant numbers in the array to consider, hence it cuts the amount of combinations to look for.

这将减少要考虑的数组中的相关数字的数量,因此将减少要查找的组合数量。

But it gets even better. Assume k=10, n=60, min=3. The first number in the array happens to be 20, so it is relevant and should be checked. Lets find the relevant subsets that include that 20:
A new "puzzle" appears! k=10-1, n=60-20, min=3. You can now cutoff many numbers from the subpuzzle, and again and again which each step further in.

但它变得更好了。假设10 k = n = 60,最小值= 3。数组中的第一个数字恰好是20,因此它是相关的,应该进行检查。让我们找到包含这20个的相关子集:一个新的“谜”出现了!k = 1,n = 60-20、最小值= 3。现在,你可以从子字谜中截取许多数字,并一次又一次地在每一步中更进一步。

This can be improved even further by calculating the average of the k-1 lowest numbers in the subpuzzle and use that as min.

通过计算k-1最小数字的平均值并将其用作最小数字,可以进一步改进这一点。

It is possible to improve this even further by precalculating the k lowest numbers average in subpuzzle [0..n], and k-1 lowest numbers average in subpuzzle [1..n], and k-2 lowest numbers average in subpuzzle [2..n] and so on, and use them instead of recalculating same stuff again and again within each subpuzzle assessment.

可以通过预先计算子拼图中k个最小值的平均值来进一步改进这个问题。n], k-1最小的子谜数平均值[1..]n], k-2最小的子谜数平均值[2。n]等等,使用它们,而不是在每个子拼图评估中反复计算相同的东西。

#3


0  

Try to use code below. Sorry, i had not time to optimise code. You can compare all methods that you have, and and make conclusion which is faster, don't forgot to share results.

尝试使用下面的代码。对不起,我没有时间优化代码。你可以比较你所有的方法,并得出结论,这更快,不要忘记分享结果。

C# (you can try to rid from List may be it will give you performance improvement:

c#(你可以试着从列表中删除它可能会给你的性能提升:

public static IEnumerable<List<int>> findSubsetsWithLengthKAndSumS2(Tuple<int, int> ks, List<int> set, List<int> subSet, List<int> subSetIndex)
    {
        if (ks.Item1 == 0 && ks.Item2 == 0)
        {
            var res = new List<List<int>>();
            res.Add(subSetIndex);
            return res;
        }
        else if (ks.Item1 > 0 && ks.Item2 > 0)
        {
            var res = set.Select((x, i) =>
            {
                var newSubset = subSet.Select(y => y).ToList();
                newSubset.Add(x);

                var newSubsetIndex = subSetIndex.Select(y => y).ToList();
                newSubsetIndex.Add(i);

                var newSet = set.Skip(i).ToList();
                return findSubsetsWithLengthKAndSumS2(Tuple.Create(ks.Item1 - 1, ks.Item2 - x), newSet, newSubset, newSubsetIndex).ToList();
            }
            ).SelectMany(x => x).ToList();
            return res;
        }
        else
            return new List<List<int>>();
    }
...
            var res = findSubsetsWithLengthKAndSumS2(Tuple.Create(2, 293), numbers.ToList(), new List<int>(), new List<int>());

F# (I added it Just for fun =), it is not optimized too, i beleive the slowest place in code is 'skip' ):

f#(我添加它只是为了好玩=),它也没有优化,我认为代码中最慢的地方是“跳过”):

let skip (list:List<int>) index = 
    list |> List.mapi (fun i x -> if i > index then Some(x) else None) |> List.filter (fun x -> x.IsSome) |> List.map (fun  x -> x.Value)

let rec findSubsetsWithLengthKAndSumS (ks:int*int) (set:list<int>) (subSet:list<int>) =
    [
        match ks with
        |0,0 ->   yield subSet 
        | x,y when x > 0 && y > 0 -> yield! set |> List.mapi (fun i x-> findSubsetsWithLengthKAndSumS ((fst ks)-1,(snd ks)-x)   (skip set i ) (x::subSet)) |> Seq.concat
        | _,_-> yield []
    ]
...
    let res = Subsets.findSubsetsWithLengthKAndSumS (2,293)  numbers [] |> List.filter (fun x-> x.Length >0) 

I beleive this Iterative version would be faster than others in many times. It uses .net 2.0:

我相信这个迭代版本在很多时候会比其他版本更快。它使用。net 2.0:

    public delegate IEnumerable< IEnumerable< int > > findSubset();

    public delegate bool findSubsetsIterativeFilter( int[] sourceSet, int[] indiciesToSum, int expected );

    public static bool Summ( int[] sourceSet, int[] indicies, int expected )
    {
        var sum = 0;
        for( int i = 0; i < indicies.Length; i++ )
            sum += sourceSet[ indicies[ i ] ];
        return sum == expected;
    }

    public static IEnumerable< IEnumerable< int > > findSubsetsIterative( int k, int[] sourceSet, findSubsetsIterativeFilter specialCondition, int expected )
    {
        var a = new int[ k ];
        for( int i = 0; i < k; i++ )
            a[ i ] = i; 

        var p = k - 1;
        while( p >= 0 )
        {
            if( specialCondition( sourceSet, a, expected ) )
                yield return ( int[] )a.Clone();
            p = ( a[ k - 1 ] == sourceSet.Length - 1 ) ? p - 1 : k - 1;
            if( p >= 0 )
                for( int i = k - 1; i >= p; i-- )
                    a[ i ] = a[ p ] + i - p + 1;
        }
    }
    ...
       findSubsetsIterative( 2, a, Summ, 293 );

I had measured my code and Yorye's and here is what i get. My code is faster in 4-10x. Did you used 'findSubsetsIterative' from my answer in your experiments?

我测量了我的代码和你的,这就是我得到的。我的代码在4-10x中更快。你在实验中使用了我的答案中的findSubsetsIterative吗?

findSubsetsIterative( 4, GenerateSOurceData(1), Summ, 400 ) Elapsed: 00:00:00.0012113 puzzle.Solve(4, 400, PuzzleOnSubsetFound) Elapsed: 00:00:00.0046170

findSubsetsIterative(4, GenerateSOurceData(1) Summ, 400)经过:00:00:00:00:00:00:00 .0012113难题。解决(4,400,PuzzleOnSubsetFound)经过:00:00:00.0046170

findSubsetsIterative( 5, GenerateSOurceData(1), Summ, 60 ) Elapsed: 00:00:00.0012960 puzzle.Solve(5, 60, PuzzleOnSubsetFound) Elapsed: 00:00:00.0108568

findSubsetsIterative(5, GenerateSOurceData(1), Summ, 60)经过:00:00:00:00:00:00 .0012960拼图。解决(5,60,PuzzleOnSubsetFound)经过:00:00:00.0108568

Here i increased incoming array in 5x (just copied array 5 times into new big array): findSubsetsIterative( 5, GenerateSOurceData(5), Summ, 60 ) Elapsed: 00:00:00.0013067 puzzle.Solve(5, 60, PuzzleOnSubsetFound) Elapsed: 00:00:21.3520840

在这里,我在5x中增加了传入数组(只是将数组复制了5次到新的大数组中):findSubsetsIterative(5, GenerateSOurceData(5), Summ, 60) Elapsed: 00:00:00:00:00:00 00.0013067难题。解决(5,60,PuzzleOnSubsetFound)经过:00:21.3520840

#1


2  

My previous answer works on the principle of cutting off the number of combinations to check. But this can be improved significantly once you sort the array. The principle is similar, but since the solution is entirely different, I've decided to put it in a separate answer.

我之前的回答是关于减少组合数量的原则。但是,一旦对数组进行排序,这一点可以得到显著改善。原理是相似的,但是由于解决方案是完全不同的,我决定把它放在一个单独的答案中。

I was careful to use only .Net Framework 2.0 features. Might add a visual explanation later, but the comments should be enough.

我谨慎地只使用。net Framework 2.0特性。稍后可能会添加一个可视的解释,但是注释应该足够了。

class Puzzle
{
    private readonly int[] _tailSums;
    public readonly SubsetElement[] Elements;
    public readonly int N;

    public Puzzle(int[] numbers)
    {
        // Set N and make Elements array
        // (to remember the original index of each element)
        this.N = numbers.Length;
        this.Elements = new SubsetElement[this.N];

        for (var i = 0; i < this.N; i++)
        {
            this.Elements[i] = new SubsetElement(numbers[i], i);
        }

        // Sort Elements descendingly by their Number value
        Array.Sort(this.Elements, (a, b) => b.Number.CompareTo(a.Number));

        // Save tail-sums to allow immediate access by index
        // Allow immedate calculation by index = N, to sum 0
        this._tailSums = new int[this.N + 1];
        var sum = 0;

        for (var i = this.N - 1; i >= 0; i--)
        {
            this._tailSums[i] = sum += this.Elements[i].Number;
        }
    }

    public void Solve(int s, Action<SubsetElement[]> callback)
    {
        for (var k = 1; k <= this.N; k++)
            this.Solve(k, s, callback);
    }

    public void Solve(int k, int s, Action<SubsetElement[]> callback)
    {
        this.ScanSubsets(0, k, s, new List<SubsetElement>(), callback);
    }

    private void ScanSubsets(int startIndex, int k, int s,
                             List<SubsetElement> subset, Action<SubsetElement[]> cb)
    {
        // No more numbers to add, and current subset is guranteed to be valid
        if (k == 0)
        {
            // Callback with current subset
            cb(subset.ToArray());
            return;
        }

        // Sum the smallest k elements
        var minSubsetStartIndex = this.N - k;
        var minSum = this._tailSums[minSubssetStartIndex];

        // Smallest possible sum is greater than wanted sum,
        // so a valid subset cannot be found
        if (minSum > s)
        {
            return;
        }

        // Find largest number that satisfies the condition
        // that a valid subset can be found
        minSum -= this.Elements[minSubsetStartIndex].Number;

        // But remember the last index that satisfies the condition
        var minSubsetEndIndex = minSubsetStartIndex;

        while (minSubsetStartIndex > startIndex &&
               minSum + this.Elements[minSubsetStartIndex - 1].Number <= s)
        {
            minSubsetStartIndex--;
        }

        // Find the first number in the sorted sequence that is
        // the largest number we just found (in case of duplicates)
        while (minSubsetStartIndex > startIndex &&
               Elements[minSubsetStartIndex] == Elements[minSubsetStartIndex - 1])
        {
            minSubsetStartIndex--;
        }

        // [minSubsetStartIndex .. maxSubsetEndIndex] is the
        // full range we must check in recursion

        for (var subsetStartIndex = minSubsetStartIndex;
             subsetStartIndex <= minSubsetEndIndex;
             subsetStartIndex++)
        {
            // Find the largest possible sum, which is the sum of the
            // k first elements, starting at current subsetStartIndex
            var maxSum = this._tailSums[subsetStartIndex] -
                         this._tailSums[subsetStartIndex + k];

            // The largest possible sum is less than the wanted sum,
            // so a valid subset cannot be found
            if (maxSum < s)
            {
                return;
            }

            // Add current number to the subset
            var x = this.Elements[subsetStartIndex];
            subset.Add(x);

            // Recurse through the sub-problem to the right
            this.ScanSubsets(subsetStartIndex + 1, k - 1, s - x.Number, subset, cb);

            // Remove current number and continue loop
            subset.RemoveAt(subset.Count - 1);
        }
    }

    public sealed class SubsetElement
    {
        public readonly int Number;
        public readonly int Index;

        public SubsetElement(int number, int index)
        {
            this.Number = number;
            this.Index = index;
        }

        public override string ToString()
        {
            return string.Format("{0}({1})", this.Number, this.Index);
        }
    }
}

Usage and performance testing:

使用和性能测试:

private static void Main()
{
    var sw = Stopwatch.StartNew();
    var puzzle = new Puzzle(new[]
        {
            7, 286, 200, 176, 120, 165, 206, 75, 129, 109,
            123, 111, 43, 52, 99, 128, 111, 110, 98, 135,
            112, 78, 118, 64, 77, 227, 93, 88, 69, 60,
            34, 30, 73, 54, 45, 83, 182, 88, 75, 85,
            54, 53, 89, 59, 37, 35, 38, 29, 18, 45,
            60, 49, 62, 55, 78, 96, 29, 22, 24, 13,
            14, 11, 11, 18, 12, 12, 30, 52, 52, 44,
            28, 28, 20, 56, 40, 31, 50, 40, 46, 42,
            29, 19, 36, 25, 22, 17, 19, 26, 30, 20,
            15, 21, 11, 8, 8, 19, 5, 8, 8, 11,
            11, 8, 3, 9, 5, 4, 7, 3, 6, 3,
            5, 4, 5, 6
        });

    puzzle.Solve(2, 17, PuzzleOnSubsetFound);

    sw.Stop();
    Console.WriteLine("Subsets found: " + _subsetsCount);
    Console.WriteLine(sw.Elapsed);
}

private static int _subsetsCount;

private static void PuzzleOnSubsetFound(Puzzle.SubsetElement[] subset)
{
    _subsetsCount++;
    return; // Skip prints when speed-testing

    foreach (var el in subset)
    {
        Console.Write(el.ToString());
        Console.Write("  ");
    }

    Console.WriteLine();
}

Output:

输出:

Each line is a found subset, numbers in () are the original index of the number used in the subset

每一行都是一个被发现的子集,数字()是子集中使用的数字的原始索引。

14(60) 3(107)
14(60) 3(109)
14(60) 3(102)
13(59) 4(105)
13(59) 4(111)
12(64) 5(96)
12(64) 5(104)
12(64) 5(112)
12(64) 5(110)
12(65) 5(96)
12(65) 5(104)
12(65) 5(112)
12(65) 5(110)
11(100) 6(108)
11(100) 6(113)
11(61) 6(108)
11(61) 6(113)
11(92) 6(108)
11(92) 6(113)
11(62) 6(108)
11(62) 6(113)
11(99) 6(108)
11(99) 6(113)
9(103) 8(93)
9(103) 8(94)
9(103) 8(97)
9(103) 8(98)
9(103) 8(101)
Subsets found: 28
00:00:00.0017020 (measured when no printing is performed)

14(60)3(107)14(60)3(109)14(60)3(102)13(59)4(105)13(59)4(111)12(64)5(96)12(64)5(104)12(64)5(112)12(64)5(110)12(65)5(96)12(65)5(104)12(65)5(112)12(65)5(110)11(100)6(108)11(100)6(113)11(61)6(108)11(61)6(113)11(92)6(108)11(92)6(113)11(62)6(108)11(62)6(113)11(99)6(108)11(99)6(113)9(103)8(93)9(103)8(94)9(103)8(97)9(103)8(98)9(103)8(101)发现:子集28 00:00:00.0017020(当不执行打印)


The higher k is, the more cutoffs can be made. This is when you'll see the major performance difference. Your current code (the recursive version) also performs significantly slower when s is goes higher.

k越高,可以做的越少。这时您将看到主要的性能差异。当s值升高时,当前代码(递归版本)的执行速度也会显著降低。

With k=5, s=60
Your current code: found 45070 subsets in 2.8602923 seconds
My code: found 45070 subsets in 0.0116727 seconds
That is 99.6% speed improvement

当k=5, s=60时,你当前的代码:在2.8602923秒内找到45070个子集我的代码:在0.0116727秒内找到45070个子集,速度提高了99.6%

#2


0  

Simply search for all combinations of size K, and check each if it satisfies the condition.

只需搜索大小为K的所有组合,并检查每个组合是否满足条件。

The fastest algorithm for k combinations, suiting your case, would be:

最适合你的k个组合的最快算法是:

for (var i1 = 0; i1 <= n; i1++)
{
    for (var i2 = i1 + 1; i2 <= n; i2++)
    {
        for (var i3 = i2 + 1; i3 <= n; i3++)
        {
            ...

            for (var ik = iOneBeforeK + 1; ik <= n; ik++)
            {
                if (arr[i1] + arr[i2] + ... + arr[ik] == sum)
                {
                    // this is a valid subset
                }
            }
        }
    }
}

But you are talking about numbers and summing them up, meaning you could make cutoffs with a smarter algorithm.

但你说的是数字并把它们加起来,这意味着你可以用更聪明的算法来进行截断。

Since all numbers are positive, you know that if a single number is big enough, you cannot add to it any more positive numbers and have it sum up to s. Given s=6 and k=4, the highest number to include in the search is s-k+1=3. 3+1+1+1 is k numbers, 1 being your lowest possible number, sums up to 6. Any number above 3 couldn't have 3 other positives added to it, and have the sum <= 6.

因为所有的数都是正数,所以你知道如果一个数足够大,你就不能再给它加任何正数,让它和s相加。给定s=6和k=4,搜索中包含的最高的数是s-k+1=3。3+1+1+1是k个数,1是最小的数,加起来是6。3以上的任何数都不能再增加3个正数,并且总和<= 6。

But wait, your minimum possible value isn't 1, it is 3. That's even better. So assume k=10, n=60, min=3. The "highest number scenario" is x+min(k-1)=n -> x=60-3*9=33. So the highest number to even consider would be 33.

但是等一下,你的最小值不是1,而是3。这是更好的。假设k=10 n=60 min=3。“最高数字场景”是x+min(k-1)=n -> x=60-3*9=33。所以最高的数字是33。

This cuts the amount of relevant numbers in the array to consider, hence it cuts the amount of combinations to look for.

这将减少要考虑的数组中的相关数字的数量,因此将减少要查找的组合数量。

But it gets even better. Assume k=10, n=60, min=3. The first number in the array happens to be 20, so it is relevant and should be checked. Lets find the relevant subsets that include that 20:
A new "puzzle" appears! k=10-1, n=60-20, min=3. You can now cutoff many numbers from the subpuzzle, and again and again which each step further in.

但它变得更好了。假设10 k = n = 60,最小值= 3。数组中的第一个数字恰好是20,因此它是相关的,应该进行检查。让我们找到包含这20个的相关子集:一个新的“谜”出现了!k = 1,n = 60-20、最小值= 3。现在,你可以从子字谜中截取许多数字,并一次又一次地在每一步中更进一步。

This can be improved even further by calculating the average of the k-1 lowest numbers in the subpuzzle and use that as min.

通过计算k-1最小数字的平均值并将其用作最小数字,可以进一步改进这一点。

It is possible to improve this even further by precalculating the k lowest numbers average in subpuzzle [0..n], and k-1 lowest numbers average in subpuzzle [1..n], and k-2 lowest numbers average in subpuzzle [2..n] and so on, and use them instead of recalculating same stuff again and again within each subpuzzle assessment.

可以通过预先计算子拼图中k个最小值的平均值来进一步改进这个问题。n], k-1最小的子谜数平均值[1..]n], k-2最小的子谜数平均值[2。n]等等,使用它们,而不是在每个子拼图评估中反复计算相同的东西。

#3


0  

Try to use code below. Sorry, i had not time to optimise code. You can compare all methods that you have, and and make conclusion which is faster, don't forgot to share results.

尝试使用下面的代码。对不起,我没有时间优化代码。你可以比较你所有的方法,并得出结论,这更快,不要忘记分享结果。

C# (you can try to rid from List may be it will give you performance improvement:

c#(你可以试着从列表中删除它可能会给你的性能提升:

public static IEnumerable<List<int>> findSubsetsWithLengthKAndSumS2(Tuple<int, int> ks, List<int> set, List<int> subSet, List<int> subSetIndex)
    {
        if (ks.Item1 == 0 && ks.Item2 == 0)
        {
            var res = new List<List<int>>();
            res.Add(subSetIndex);
            return res;
        }
        else if (ks.Item1 > 0 && ks.Item2 > 0)
        {
            var res = set.Select((x, i) =>
            {
                var newSubset = subSet.Select(y => y).ToList();
                newSubset.Add(x);

                var newSubsetIndex = subSetIndex.Select(y => y).ToList();
                newSubsetIndex.Add(i);

                var newSet = set.Skip(i).ToList();
                return findSubsetsWithLengthKAndSumS2(Tuple.Create(ks.Item1 - 1, ks.Item2 - x), newSet, newSubset, newSubsetIndex).ToList();
            }
            ).SelectMany(x => x).ToList();
            return res;
        }
        else
            return new List<List<int>>();
    }
...
            var res = findSubsetsWithLengthKAndSumS2(Tuple.Create(2, 293), numbers.ToList(), new List<int>(), new List<int>());

F# (I added it Just for fun =), it is not optimized too, i beleive the slowest place in code is 'skip' ):

f#(我添加它只是为了好玩=),它也没有优化,我认为代码中最慢的地方是“跳过”):

let skip (list:List<int>) index = 
    list |> List.mapi (fun i x -> if i > index then Some(x) else None) |> List.filter (fun x -> x.IsSome) |> List.map (fun  x -> x.Value)

let rec findSubsetsWithLengthKAndSumS (ks:int*int) (set:list<int>) (subSet:list<int>) =
    [
        match ks with
        |0,0 ->   yield subSet 
        | x,y when x > 0 && y > 0 -> yield! set |> List.mapi (fun i x-> findSubsetsWithLengthKAndSumS ((fst ks)-1,(snd ks)-x)   (skip set i ) (x::subSet)) |> Seq.concat
        | _,_-> yield []
    ]
...
    let res = Subsets.findSubsetsWithLengthKAndSumS (2,293)  numbers [] |> List.filter (fun x-> x.Length >0) 

I beleive this Iterative version would be faster than others in many times. It uses .net 2.0:

我相信这个迭代版本在很多时候会比其他版本更快。它使用。net 2.0:

    public delegate IEnumerable< IEnumerable< int > > findSubset();

    public delegate bool findSubsetsIterativeFilter( int[] sourceSet, int[] indiciesToSum, int expected );

    public static bool Summ( int[] sourceSet, int[] indicies, int expected )
    {
        var sum = 0;
        for( int i = 0; i < indicies.Length; i++ )
            sum += sourceSet[ indicies[ i ] ];
        return sum == expected;
    }

    public static IEnumerable< IEnumerable< int > > findSubsetsIterative( int k, int[] sourceSet, findSubsetsIterativeFilter specialCondition, int expected )
    {
        var a = new int[ k ];
        for( int i = 0; i < k; i++ )
            a[ i ] = i; 

        var p = k - 1;
        while( p >= 0 )
        {
            if( specialCondition( sourceSet, a, expected ) )
                yield return ( int[] )a.Clone();
            p = ( a[ k - 1 ] == sourceSet.Length - 1 ) ? p - 1 : k - 1;
            if( p >= 0 )
                for( int i = k - 1; i >= p; i-- )
                    a[ i ] = a[ p ] + i - p + 1;
        }
    }
    ...
       findSubsetsIterative( 2, a, Summ, 293 );

I had measured my code and Yorye's and here is what i get. My code is faster in 4-10x. Did you used 'findSubsetsIterative' from my answer in your experiments?

我测量了我的代码和你的,这就是我得到的。我的代码在4-10x中更快。你在实验中使用了我的答案中的findSubsetsIterative吗?

findSubsetsIterative( 4, GenerateSOurceData(1), Summ, 400 ) Elapsed: 00:00:00.0012113 puzzle.Solve(4, 400, PuzzleOnSubsetFound) Elapsed: 00:00:00.0046170

findSubsetsIterative(4, GenerateSOurceData(1) Summ, 400)经过:00:00:00:00:00:00:00 .0012113难题。解决(4,400,PuzzleOnSubsetFound)经过:00:00:00.0046170

findSubsetsIterative( 5, GenerateSOurceData(1), Summ, 60 ) Elapsed: 00:00:00.0012960 puzzle.Solve(5, 60, PuzzleOnSubsetFound) Elapsed: 00:00:00.0108568

findSubsetsIterative(5, GenerateSOurceData(1), Summ, 60)经过:00:00:00:00:00:00 .0012960拼图。解决(5,60,PuzzleOnSubsetFound)经过:00:00:00.0108568

Here i increased incoming array in 5x (just copied array 5 times into new big array): findSubsetsIterative( 5, GenerateSOurceData(5), Summ, 60 ) Elapsed: 00:00:00.0013067 puzzle.Solve(5, 60, PuzzleOnSubsetFound) Elapsed: 00:00:21.3520840

在这里,我在5x中增加了传入数组(只是将数组复制了5次到新的大数组中):findSubsetsIterative(5, GenerateSOurceData(5), Summ, 60) Elapsed: 00:00:00:00:00:00 00.0013067难题。解决(5,60,PuzzleOnSubsetFound)经过:00:21.3520840