查找庞大数据集的子集总数

时间:2022-02-02 21:46:22

1st of all: I'm not a programmer, never learnt programming/algorithms. Actually I have to program, mostly in awk, or ruby, some bash.

首先:我不是程序员,从不学习编程/算法。实际上我必须编程,主要是在awk,或ruby,一些bash。

In today's task, I have a huge data set (float numbers) in a plain text file, one record/line, and a sum of all numbers of the set, but the sum is wrong, because some of the numbers (can be only one) in the set is negative, but we can't see it in the file (there's no sign if an element is negative).

在今天的任务中,我有一个巨大的数据集(浮点数)在一个纯文本文件,一个记录/行,以及所有数字的总和,但总和是错误的,因为一些数字(可能只是一)在集合中是负数,但我们无法在文件中看到它(如果元素是负数则没有迹象)。

But I have to find it/them: so first I had calculated the correct total sum (with adding all the numbers with awk) didn't care about their signs. Now I now the difference between the original sum (which cared about signs) and my new total sum. But I have to find all the subsets of the dataset, which has the exact same sum like the difference/2.

但我必须找到它/他们:所以首先我计算了正确的总和(用awk添加所有数字)并不关心他们的迹象。现在我现在是原始金额(关心标志)和我的新总金额之间的差异。但是我必须找到数据集的所有子集,它们具有完全相同的总和,如差值/ 2。

E.g.:

DATA:
1,2,3,4,5

ORIG SUM: 
5  

Now we can calculate the difference between 1+2+3+4+5 - ORIG SUM: 15-5=10. 10/2 = 5, so I need to find all the subsets which can add up to 5, that is [1,4],[2,3],[5].

现在我们可以计算1 + 2 + 3 + 4 + 5之间的差异 - ORIG SUM:15-5 = 10。 10/2 = 5,所以我需要找到所有可以加起来为5的子集,即[1,4],[2,3],[5]。

Is there a proper way to do it? I prefer awk, ruby, shell scripting, but both python and perl is acceptable (without heavy uses of external libraries, as I've got no right to install them).

有没有正确的方法呢?我更喜欢awk,ruby,shell脚本,但python和perl都是可以接受的(没有大量使用外部库,因为我没有权利安装它们)。

Thanks in advance.

提前致谢。

1 个解决方案

#1


You mean the SUBSET SUM problem as known in computer science?

你的意思是计算机科学中已知的SUBSET SUM问题?

Hint: Look in the related questions, there are MANY questions/answers about that problem.

提示:查看相关问题,关于该问题有很多问题/答案。

#1


You mean the SUBSET SUM problem as known in computer science?

你的意思是计算机科学中已知的SUBSET SUM问题?

Hint: Look in the related questions, there are MANY questions/answers about that problem.

提示:查看相关问题,关于该问题有很多问题/答案。