在总和匹配的两组整数中查找子集的算法

时间:2021-12-02 11:29:32

I'm looking for an algorithm which can take two sets of integers (both positive and negative) and find subsets within each that have the same sum.

我正在寻找一种算法,它可以采用两组整数(正数和负数),并找到每个具有相同总和的子集。

The problem is similar to the subset sum problem except that I'm looking for subsets on both sides.

问题类似于子集和问题,除了我正在寻找两侧的子集。

Here's an example:

这是一个例子:

List A {4, 5, 9, 10, 1}

列表A {4,5,9,10,1}

List B {21, 7, -4, 180}

清单B {21,7,-4,180}

So the only match here is: {10, 1, 4, 9} <=> {21, 7, -4}

所以这里唯一的匹配是:{10,1,4,9} <=> {21,7,-4}

Does anyone know if there are existing algorithms for this kinda problems?

有谁知道这种问题是否存在现有的算法?

So far, the only solution I have is a brute force approach which tries every combination but it performs in Exponential time and I've had to put a hard limit on the number of elements to consider to avoid it from taking too long.

到目前为止,我所拥有的唯一解决方案是蛮力方法,它尝试每个组合,但它在指数时间内执行,我不得不对要考虑的元素数量进行严格限制,以避免花费太长时间。

The only other solution I can think of is to run a factorial on both lists and look for equalities there but that is still not very efficient and takes exponentially longer as the lists get bigger.

我能想到的唯一其他解决方案是在两个列表上运行一个阶乘并在那里寻找均衡但这仍然不是非常有效,并且随着列表变大而需要指数地增长。

4 个解决方案

#1


What others have said is true:

别人说的是真的:

  1. This problem is NP-complete. An easy reduction is to usual subset-sum. You can show this by noting that a subset of A sums to a subset of B (not both empty) only if a non-empty subset of A union (-B) sums to zero.

    这个问题是NP完全的。容易减少的是通常的子集和。只有当A联合(-B)的非空子集总和为零时,您才能通过注意A的子集与B的子集(不是两者都是和)来表示这一点。

  2. This problem is only weakly NP-complete, in that it's polynomial in the size of the numbers involved, but is conjectured to be exponential in their logarithms. This means that the problem is easier than the moniker "NP-complete" might suggest.

    这个问题只是NP完全的弱,因为它是所涉及数字大小的多项式,但推测它们的对数是指数的。这意味着问题比“NP-complete”这个名字更容易提出。

  3. You should use dynamic programming.

    你应该使用动态编程。

So what am I contributing to this discussion? Well, code (in Perl):

那么我对这次讨论的贡献是什么?好吧,代码(在Perl中):

@a = qw(4 5 9 10 1);
@b = qw(21 7 -4 180);
%a = sums( @a );
%b = sums( @b );
for $m ( keys %a ) {
    next unless exists $b{$m};
    next if $m == 0 and (@{$a{0}} == 0 or @{$b{0}} == 0);
    print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n";
}

sub sums {
    my( @a ) = @_;
    my( $a, %a, %b );
    %a = ( 0 => [] );
    while( @a ) {
        %b = %a;
        $a = shift @a;
        for my $m ( keys %a ) {
            $b{$m+$a} = [@{$a{$m}},$a];
        }
    %a = %b;
    }
    return %a;
}

It prints

sum(4 5 9 10) = sum(21 7)
sum(4 9 10 1) = sum(21 7 -4)

so, notably, there is more than one solution that works in your original problem!

所以,值得注意的是,有多个解决方案适用于您的原始问题!

Edit: User itzy correctly pointed out that this solution was wrong, and worse, in multiple ways!! I'm very sorry about that and I've hopefully addressed these concerns in the new code above. Nonetheless, there is still one problem, namely that for any particular subset-sum, it only prints one of the possible solutions. Unlike the previous problems, which were straight-up errors, I would classify this as an intentional limitation. Best of luck and beware of bugs!

编辑:用户itzy正确地指出这个解决方案是错误的,更糟糕​​的是,在多种方式!!我很抱歉,我希望在上面的新代码中解决这些问题。尽管如此,仍然存在一个问题,即对于任何特定的子集和,它只打印一种可能的解决方案。与之前出现的直接错误问题不同,我将其归类为故意限制。祝你好运并提防虫子!

#2


Like the subset sum problem, this problem is weakly NP-complete, so it has a solution that runs in time polynomial(M), where M is the sum of all numbers appearing in the problem instance. You can achieve that with dynamic programming. For each set you can generate all possible sums by filling a 2-dimensional binary table, where "true" at (k,m) means that a subset sum m can be achieved by picking some elements from the first k elements of the set.

像子集和问题一样,这个问题是NP完全弱的,因此它有一个以时间多项式(M)运行的解,其中M是问题实例中出现的所有数字的总和。您可以通过动态编程实现这一目标。对于每个集合,您可以通过填充二维二进制表来生成所有可能的总和,其中(k,m)处的“真”表示可以通过从集合的前k个元素中挑选一些元素来实现子集和m。

You fill it iteratively - you set (k,m) to "true" if (k-1,m) is set to "true" (obviously, if you can get m from k-1 elements, you can get it from k elements by not picking the k-th) or if (k-1,m-d) is set to "true" where d is the value of k-th element in the set (the case where you pick the k-th element).

你迭代地填充它 - 你将(k,m)设置为“true”如果(k-1,m)设置为“true”(显然,如果你可以从k-1元素得到m,你可以从k得到它通过不选择第k个元素或if(k-1,md)被设置为“true”,其中d是集合中第k个元素的值(选择第k个元素的情况)。

Filling the table gets you all the possible sums in the last column (the one representing the whole set). Do this for both sets and find common sums. You can backtrack the actual subsets representing the solutions by reversing the process which you used to fill the tables.

填充表格可以获得最后一列中的所有可能总和(表示整个集合的总和)。对两个集合执行此操作并查找公共总和。您可以通过反转用于填充表的过程来回溯表示解决方案的实际子集。

#3


Thanks a lot for all the quick responses!

非常感谢所有快速回复!

The dynamic programming solution is not really different to the exhaustive approch we have right now and I guess if we need the optimal solution we do need to consider every possible combination but the time it takes to generate this exhaustive list of sums are too long.. Did a quick test and the time it takes to generate all possible sums for x number of elements very quickly go over 1 min:

动态编程解决方案与我们现在的详尽解决方案并没有什么不同,我想如果我们需要最优解,我们需要考虑每个可能的组合,但生成这个详尽的总和列表所花费的时间太长了。快速测试以及为x个元素生成所有可能总和所需的时间超过1分钟:

11 elements took - 0.015625 seconds
12 elements took - 0.015625 seconds
13 elements took - 0.046875 seconds
14 elements took - 0.109375 seconds
15 elements took - 0.171875 seconds
16 elements took - 0.359375 seconds
17 elements took - 0.765625 seconds
18 elements took - 1.609375 seconds
19 elements took - 3.40625 seconds
20 elements took - 7.15625 seconds
21 elements took - 14.96875 seconds
22 elements took - 31.40625 seconds
23 elements took - 65.875 seconds
24 elements took - 135.953125 seconds
25 elements took - 282.015625 seconds
26 elements took - 586.140625 seconds
27 elements took - 1250.421875 seconds
28 elements took - 2552.53125 seconds
29 elements took - 5264.34375 seconds

which for the business problem we're trying to solve is not really acceptable.. I'm gonna go back to the drawing board and see whether we do indeed need to know all the solutions or can we just do with one (the smallest/largest subset, e.g.) instead and hopefully that can help simply the problem and make my algorithm perform to expectaion.

对于我们试图解决的业务问题并不是真的可以接受。我将回到绘图板,看看我们是否确实需要知道所有的解决方案,或者我们只能做一个(最小的/最大的子集,例如)而且希望这可以帮助简化问题并使我的算法执行预期。

Thanks all the same!

谢谢你们!

#4


subset sum is Np-complete and you can polynomially reduce your problem to it, so your problem is NP-complete too.

子集和是Np-complete,你可以多项式地减少你的问题,所以你的问题也是NP完全的。

#1


What others have said is true:

别人说的是真的:

  1. This problem is NP-complete. An easy reduction is to usual subset-sum. You can show this by noting that a subset of A sums to a subset of B (not both empty) only if a non-empty subset of A union (-B) sums to zero.

    这个问题是NP完全的。容易减少的是通常的子集和。只有当A联合(-B)的非空子集总和为零时,您才能通过注意A的子集与B的子集(不是两者都是和)来表示这一点。

  2. This problem is only weakly NP-complete, in that it's polynomial in the size of the numbers involved, but is conjectured to be exponential in their logarithms. This means that the problem is easier than the moniker "NP-complete" might suggest.

    这个问题只是NP完全的弱,因为它是所涉及数字大小的多项式,但推测它们的对数是指数的。这意味着问题比“NP-complete”这个名字更容易提出。

  3. You should use dynamic programming.

    你应该使用动态编程。

So what am I contributing to this discussion? Well, code (in Perl):

那么我对这次讨论的贡献是什么?好吧,代码(在Perl中):

@a = qw(4 5 9 10 1);
@b = qw(21 7 -4 180);
%a = sums( @a );
%b = sums( @b );
for $m ( keys %a ) {
    next unless exists $b{$m};
    next if $m == 0 and (@{$a{0}} == 0 or @{$b{0}} == 0);
    print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n";
}

sub sums {
    my( @a ) = @_;
    my( $a, %a, %b );
    %a = ( 0 => [] );
    while( @a ) {
        %b = %a;
        $a = shift @a;
        for my $m ( keys %a ) {
            $b{$m+$a} = [@{$a{$m}},$a];
        }
    %a = %b;
    }
    return %a;
}

It prints

sum(4 5 9 10) = sum(21 7)
sum(4 9 10 1) = sum(21 7 -4)

so, notably, there is more than one solution that works in your original problem!

所以,值得注意的是,有多个解决方案适用于您的原始问题!

Edit: User itzy correctly pointed out that this solution was wrong, and worse, in multiple ways!! I'm very sorry about that and I've hopefully addressed these concerns in the new code above. Nonetheless, there is still one problem, namely that for any particular subset-sum, it only prints one of the possible solutions. Unlike the previous problems, which were straight-up errors, I would classify this as an intentional limitation. Best of luck and beware of bugs!

编辑:用户itzy正确地指出这个解决方案是错误的,更糟糕​​的是,在多种方式!!我很抱歉,我希望在上面的新代码中解决这些问题。尽管如此,仍然存在一个问题,即对于任何特定的子集和,它只打印一种可能的解决方案。与之前出现的直接错误问题不同,我将其归类为故意限制。祝你好运并提防虫子!

#2


Like the subset sum problem, this problem is weakly NP-complete, so it has a solution that runs in time polynomial(M), where M is the sum of all numbers appearing in the problem instance. You can achieve that with dynamic programming. For each set you can generate all possible sums by filling a 2-dimensional binary table, where "true" at (k,m) means that a subset sum m can be achieved by picking some elements from the first k elements of the set.

像子集和问题一样,这个问题是NP完全弱的,因此它有一个以时间多项式(M)运行的解,其中M是问题实例中出现的所有数字的总和。您可以通过动态编程实现这一目标。对于每个集合,您可以通过填充二维二进制表来生成所有可能的总和,其中(k,m)处的“真”表示可以通过从集合的前k个元素中挑选一些元素来实现子集和m。

You fill it iteratively - you set (k,m) to "true" if (k-1,m) is set to "true" (obviously, if you can get m from k-1 elements, you can get it from k elements by not picking the k-th) or if (k-1,m-d) is set to "true" where d is the value of k-th element in the set (the case where you pick the k-th element).

你迭代地填充它 - 你将(k,m)设置为“true”如果(k-1,m)设置为“true”(显然,如果你可以从k-1元素得到m,你可以从k得到它通过不选择第k个元素或if(k-1,md)被设置为“true”,其中d是集合中第k个元素的值(选择第k个元素的情况)。

Filling the table gets you all the possible sums in the last column (the one representing the whole set). Do this for both sets and find common sums. You can backtrack the actual subsets representing the solutions by reversing the process which you used to fill the tables.

填充表格可以获得最后一列中的所有可能总和(表示整个集合的总和)。对两个集合执行此操作并查找公共总和。您可以通过反转用于填充表的过程来回溯表示解决方案的实际子集。

#3


Thanks a lot for all the quick responses!

非常感谢所有快速回复!

The dynamic programming solution is not really different to the exhaustive approch we have right now and I guess if we need the optimal solution we do need to consider every possible combination but the time it takes to generate this exhaustive list of sums are too long.. Did a quick test and the time it takes to generate all possible sums for x number of elements very quickly go over 1 min:

动态编程解决方案与我们现在的详尽解决方案并没有什么不同,我想如果我们需要最优解,我们需要考虑每个可能的组合,但生成这个详尽的总和列表所花费的时间太长了。快速测试以及为x个元素生成所有可能总和所需的时间超过1分钟:

11 elements took - 0.015625 seconds
12 elements took - 0.015625 seconds
13 elements took - 0.046875 seconds
14 elements took - 0.109375 seconds
15 elements took - 0.171875 seconds
16 elements took - 0.359375 seconds
17 elements took - 0.765625 seconds
18 elements took - 1.609375 seconds
19 elements took - 3.40625 seconds
20 elements took - 7.15625 seconds
21 elements took - 14.96875 seconds
22 elements took - 31.40625 seconds
23 elements took - 65.875 seconds
24 elements took - 135.953125 seconds
25 elements took - 282.015625 seconds
26 elements took - 586.140625 seconds
27 elements took - 1250.421875 seconds
28 elements took - 2552.53125 seconds
29 elements took - 5264.34375 seconds

which for the business problem we're trying to solve is not really acceptable.. I'm gonna go back to the drawing board and see whether we do indeed need to know all the solutions or can we just do with one (the smallest/largest subset, e.g.) instead and hopefully that can help simply the problem and make my algorithm perform to expectaion.

对于我们试图解决的业务问题并不是真的可以接受。我将回到绘图板,看看我们是否确实需要知道所有的解决方案,或者我们只能做一个(最小的/最大的子集,例如)而且希望这可以帮助简化问题并使我的算法执行预期。

Thanks all the same!

谢谢你们!

#4


subset sum is Np-complete and you can polynomially reduce your problem to it, so your problem is NP-complete too.

子集和是Np-complete,你可以多项式地减少你的问题,所以你的问题也是NP完全的。