In my program I will have multiple array with about 40 000 strings each with different length (from 10 to 5000 character), I need to send this array to an API wich only accept 5 000 character at a time.
在我的程序中,我将有多个数组,其中大约有40,000个字符串,每个字符串具有不同的长度(从10到5000个字符),我需要将此数组发送到一个只接受5 000个字符的API。
In order to make the fewest API call I need to find the best combinations of string to send each time.
为了进行最少的API调用,我需要找到每次发送的最佳字符串组合。
For example if I got an array with different lenght {3, 5, 10, 3, 4, 1, 4} and the maximum lenght of the api is 10. It should returns {10}, {4 1 5}, {3 3 4}.
例如,如果我得到一个具有不同长度{3,5,10,3,4,1,4}的数组,并且api的最大长度为10.它应该返回{10},{4 1 5},{3 3 4}。
I've been looking through different algorithm but no one seems to fill my need. (Subset Sum and others)
我一直在寻找不同的算法,但似乎没有人满足我的需要。 (子集和其他)
Any help is greatly appreciated!
任何帮助是极大的赞赏!
5 个解决方案
#1
17
Your problem is Bin Packing problem. Please find pretty nice solution in following paper: A new algorithm for optimal bin packing by Richard Korf (see example problem there)
你的问题是Bin Packing问题。请在下面的文章中找到非常好的解决方案:Richard Korf的最佳装箱算法(参见示例问题)
Let us see the example for array:
让我们看一下数组的例子:
MAXSIZE=20
[1 2 4 5 7 10 11]
With algorithm from referenced paper you will get:
使用参考纸张的算法,您将获得:
[11 4 5] [10 7 2 1]
In short this algorithm build bin by:
简而言之,这个算法构建bin:
-
insert into bin maximal element
插入bin最大元素
-
Search for all elements that fits to volume left and maximize their sum
搜索适合剩余音量的所有元素并最大化它们的总和
For example in our case first step would be:
例如,在我们的例子中,第一步将是:
# Take max element
[11]
# We have 9 volume left
# All smaller are [1 2 4 5 7] - greedy would take 7 in this case
# 4 and 5 sums up to 9 which is best fit in this case so first bin become:
[11 5 4]
# Next step: take max
[10]
# we have 10 volume left. elements lower than 10:
# [1 2 7]
# this sums up to 10 in this case giving second bin
[10 7 2 1]
And just some example of greedy vs mentioned one:
只是一些贪婪的例子提到了一个:
ARR = [3, 3, 5, 5, 5, 5, 14]
BINSIZE = 20
Greedy result:
Size 3:
[[14, 5], [5, 5, 5, 3], [3]]
Mentioned alg result (size 2):
[[14, 3, 3], [5, 5, 5, 5]]
Also you may be interested in section 'Exact algorithm' on wiki page.
您可能对wiki页面上的“精确算法”部分感兴趣。
#2
0
Definitely looks like a dynamic programming problem. Your question is similar to the Subset Sum problem, except that instead of just finding if such a subset exists, you want to return all such subsets.
绝对看起来像一个动态编程问题。您的问题与子集和问题类似,不同之处在于您不想仅查找是否存在此类子集,而是要返回所有此类子集。
This link seems to be close to what you need: http://www.careercup.com/question?id=12899672
这个链接似乎接近你所需要的:http://www.careercup.com/question?id = 12899672
Dynamic programming is often pretty tough to wrap your head around. I'm hoping someone else will provide a thorough explanation (for me as well), but hopefully this gives you somewhere to start.
动态编程通常非常难以理解。我希望其他人能提供一个彻底的解释(对我而言),但希望这会给你一个开始的地方。
#3
0
This is the dynamic programming problem (Subset sum problem) with a variation. Not only do we want to find if the sum exists, but we also want to find all the different subsets too.
这是带有变化的动态编程问题(子集和问题)。我们不仅想要找到总和是否存在,而且我们也希望找到所有不同的子集。
We build a 2-d boolean lookup table of sum(rows) - vs - number(cols) as is typical in many DP problems. For finding subsets which match the sum exactly, we can call the following backtracking function on the lookup table to find possible valid sums.
我们构建了一个二维布尔查找表,其中包含sum(rows) - vs - number(cols),这在许多DP问题中是典型的。为了找到与总和完全匹配的子集,我们可以在查找表上调用以下回溯函数来查找可能的有效总和。
bool backtrack(bool **subset, int sum, int a[], int n) {
if(sum == 0) { // Sum possible
return true;
}
if(sum < 0) { //Sum not possible
return false;
}
for(int j=1; j<=n; j++) {
if(subset[sum][j] == true) {
int val = a[j-1];
// If val is included, can we have a valid sum?
bool valid = backtrack(subset, sum-val, a, j-1);
if(valid == true) {
printf("%d ", val);
return true;
}
}
}
return false;
}
We can call the above function in this manner to print out the number combinations, one combination in each row-
我们可以这种方式调用上述功能来打印出数字组合,每行一个组合 -
for(j=1; j<=n; j++) {
if(subset[sum][j] == 1) { //For every col which is =1 for the sum'th row
bool valid = backtrack(subset, sum-a[j-1], a, j-1);
if(valid) {
printf("%d\n", a[j-1]);
}
}
}
#4
0
How does this work for you? Obviously, you can change the max to be whatever you want, and probably change it to be set from the calling function, but i'll leave those choices up to you.
这对你有什么用?显然,您可以将max更改为您想要的任何内容,并可能将其更改为从调用函数设置,但我会将这些选择留给您。
This worked for me, let me know if you have any issues with it.
这对我有用,请告诉我你是否有任何问题。
List<List<string>> Chunk(List<string> inputStrings)
{
List<List<string>> retVal = new List<List<string>>();
List<string> sortedStrings = inputStrings.OrderByDescending(s => s.Length).ToList();
while (sortedStrings.Any())
{
List<string> set = new List<string>();
int max = 10;
for (int i = 0; i < sortedStrings.Count(); ++i)
{
if (max == 0)
break;
if (max - sortedStrings[i].Length < 0)
continue;
set.Add(sortedStrings[i]);
max -= sortedStrings[i].Length;
sortedStrings.RemoveAt(i);
--i;
}
if(set.Any())
retVal.Add(set);
}
return retVal;
}
Note: This is C#. If needed, I can redo it in another language or with different data structures.
注意:这是C#。如果需要,我可以用另一种语言或不同的数据结构重做它。
#5
0
This seem to be resolved by a Greedy algorithm and not backtrack algorithm that should be executed before sending string to the API.
这似乎是通过Greedy算法解决的,而不是在将字符串发送到API之前应该执行的回溯算法。
#1
17
Your problem is Bin Packing problem. Please find pretty nice solution in following paper: A new algorithm for optimal bin packing by Richard Korf (see example problem there)
你的问题是Bin Packing问题。请在下面的文章中找到非常好的解决方案:Richard Korf的最佳装箱算法(参见示例问题)
Let us see the example for array:
让我们看一下数组的例子:
MAXSIZE=20
[1 2 4 5 7 10 11]
With algorithm from referenced paper you will get:
使用参考纸张的算法,您将获得:
[11 4 5] [10 7 2 1]
In short this algorithm build bin by:
简而言之,这个算法构建bin:
-
insert into bin maximal element
插入bin最大元素
-
Search for all elements that fits to volume left and maximize their sum
搜索适合剩余音量的所有元素并最大化它们的总和
For example in our case first step would be:
例如,在我们的例子中,第一步将是:
# Take max element
[11]
# We have 9 volume left
# All smaller are [1 2 4 5 7] - greedy would take 7 in this case
# 4 and 5 sums up to 9 which is best fit in this case so first bin become:
[11 5 4]
# Next step: take max
[10]
# we have 10 volume left. elements lower than 10:
# [1 2 7]
# this sums up to 10 in this case giving second bin
[10 7 2 1]
And just some example of greedy vs mentioned one:
只是一些贪婪的例子提到了一个:
ARR = [3, 3, 5, 5, 5, 5, 14]
BINSIZE = 20
Greedy result:
Size 3:
[[14, 5], [5, 5, 5, 3], [3]]
Mentioned alg result (size 2):
[[14, 3, 3], [5, 5, 5, 5]]
Also you may be interested in section 'Exact algorithm' on wiki page.
您可能对wiki页面上的“精确算法”部分感兴趣。
#2
0
Definitely looks like a dynamic programming problem. Your question is similar to the Subset Sum problem, except that instead of just finding if such a subset exists, you want to return all such subsets.
绝对看起来像一个动态编程问题。您的问题与子集和问题类似,不同之处在于您不想仅查找是否存在此类子集,而是要返回所有此类子集。
This link seems to be close to what you need: http://www.careercup.com/question?id=12899672
这个链接似乎接近你所需要的:http://www.careercup.com/question?id = 12899672
Dynamic programming is often pretty tough to wrap your head around. I'm hoping someone else will provide a thorough explanation (for me as well), but hopefully this gives you somewhere to start.
动态编程通常非常难以理解。我希望其他人能提供一个彻底的解释(对我而言),但希望这会给你一个开始的地方。
#3
0
This is the dynamic programming problem (Subset sum problem) with a variation. Not only do we want to find if the sum exists, but we also want to find all the different subsets too.
这是带有变化的动态编程问题(子集和问题)。我们不仅想要找到总和是否存在,而且我们也希望找到所有不同的子集。
We build a 2-d boolean lookup table of sum(rows) - vs - number(cols) as is typical in many DP problems. For finding subsets which match the sum exactly, we can call the following backtracking function on the lookup table to find possible valid sums.
我们构建了一个二维布尔查找表,其中包含sum(rows) - vs - number(cols),这在许多DP问题中是典型的。为了找到与总和完全匹配的子集,我们可以在查找表上调用以下回溯函数来查找可能的有效总和。
bool backtrack(bool **subset, int sum, int a[], int n) {
if(sum == 0) { // Sum possible
return true;
}
if(sum < 0) { //Sum not possible
return false;
}
for(int j=1; j<=n; j++) {
if(subset[sum][j] == true) {
int val = a[j-1];
// If val is included, can we have a valid sum?
bool valid = backtrack(subset, sum-val, a, j-1);
if(valid == true) {
printf("%d ", val);
return true;
}
}
}
return false;
}
We can call the above function in this manner to print out the number combinations, one combination in each row-
我们可以这种方式调用上述功能来打印出数字组合,每行一个组合 -
for(j=1; j<=n; j++) {
if(subset[sum][j] == 1) { //For every col which is =1 for the sum'th row
bool valid = backtrack(subset, sum-a[j-1], a, j-1);
if(valid) {
printf("%d\n", a[j-1]);
}
}
}
#4
0
How does this work for you? Obviously, you can change the max to be whatever you want, and probably change it to be set from the calling function, but i'll leave those choices up to you.
这对你有什么用?显然,您可以将max更改为您想要的任何内容,并可能将其更改为从调用函数设置,但我会将这些选择留给您。
This worked for me, let me know if you have any issues with it.
这对我有用,请告诉我你是否有任何问题。
List<List<string>> Chunk(List<string> inputStrings)
{
List<List<string>> retVal = new List<List<string>>();
List<string> sortedStrings = inputStrings.OrderByDescending(s => s.Length).ToList();
while (sortedStrings.Any())
{
List<string> set = new List<string>();
int max = 10;
for (int i = 0; i < sortedStrings.Count(); ++i)
{
if (max == 0)
break;
if (max - sortedStrings[i].Length < 0)
continue;
set.Add(sortedStrings[i]);
max -= sortedStrings[i].Length;
sortedStrings.RemoveAt(i);
--i;
}
if(set.Any())
retVal.Add(set);
}
return retVal;
}
Note: This is C#. If needed, I can redo it in another language or with different data structures.
注意:这是C#。如果需要,我可以用另一种语言或不同的数据结构重做它。
#5
0
This seem to be resolved by a Greedy algorithm and not backtrack algorithm that should be executed before sending string to the API.
这似乎是通过Greedy算法解决的,而不是在将字符串发送到API之前应该执行的回溯算法。