What would be the most efficient way to compare a known string against and array of strings in order to see if the given string matches any in the array?
比较已知字符串和字符串数组的最有效方法是什么?
For example: You have
例如:你有
string String1 = "ID5";
string String2 = "ID7";
You want to see if either of them are contained in the following
您希望查看它们是否包含在以下内容中
string List[5] = {"ID1", "ID7", "ID10", "ID34", "ID62"}
So that you would be able to do this
这样你就能做到
if(#STRINGMATCHES) {
// Do one thing
}
else {
// Do another
}
5 个解决方案
#2
2
If you need to perform this search operation a lot of times here is what I propose - hash all the strings using some hash function and then create a new array containing the sorted hashes. Then when you need to check if a string is contained in the array do a binary_search of its hash in the sorted array. This will be way more efficient then doing just std::find as proposed by als but depend on the fact you will need to perform the search operation enough times so that the speed gain makes up for the sorting overhead.
如果您需要多次执行这个搜索操作,我的建议是——使用一些散列函数对所有字符串进行散列,然后创建一个包含排序散列的新数组。然后,当您需要检查数组中是否包含字符串时,请在排序数组中对其散列进行binary_search。这将比只执行std::find更有效,但这取决于您需要执行足够多的搜索操作,以便速度的增加可以弥补排序开销。
#3
1
If the array is sorted you can use std::binary_search()
:
如果数组已排序,您可以使用std::binary_search():
std::string List[] = { "ID1", "ID10", "ID7", "ID34", "ID62" };
if (std::binary_search(std::begin(List), std::end(List), "ID7"))
{
std::cout << "found string\n";
}
If not, the use std::find()
(as already stated by Als).
如果不是,则使用std::find()(如Als所述)。
#4
1
The simplest solution would be to put the strings you're looking for into an array and use std::find_first_of
:
最简单的解决方案是将要查找的字符串放入一个数组,并使用std::find_first_of:
std::string targetList[] = { "ID5", "ID7" };
std::string searchList[] = { "ID1", "ID2", "ID3", "ID4", "ID5" };
if ( std::find_first_of( begin( searchList ), end( searchList ),
begin( targetList ), end( targetList ) )
!= end( targetList ) ) {
// found...
} else {
// not found...
}
This is not necessarily the most efficient solution, because find_first_of
makes no assumtions concerning the data. If the search list is very large and doesn't change, for example, and the target list only contains a few elements, it might be more efficient to sort the search list, and do a binary search for each element in the target list.
这并不一定是最有效的解决方案,因为find_first_of没有对数据进行任何假设。例如,如果搜索列表非常大并且没有变化,而目标列表只包含几个元素,那么对搜索列表进行排序,并对目标列表中的每个元素进行二进制搜索可能会更有效。
#5
0
I have an idea.
我有个主意。
first, we should make List sorted.just as hmjd describing.
首先,我们应该对列表进行排序。正如hmjd描述。
when comparing two strings, we can record some information.
当比较两个字符串时,我们可以记录一些信息。
For example,
例如,
table two dimenssion array dif records the index where two strings differs.
表二维数组dif记录了两个字符串不同的索引。
string[2] = {"abc","abd"}
list[5] = {"aab","abb","abc","bcd","ef"}
dif[0][0] = 1 ("abc" and "aab" differ at index 1)
dif[0][1] = 2 ("abc" and "abb" differ at index 2)
dif[0][2] = -1 ("abc" and "abc" are same, so we use -1 to represent two strings are same)
dif[0][3] = 0 ("abc" and "bcd" differ at index 0)
dif[0][4] = 0 ("abc" and "eg" differ at index 0)
when we need to compare a new string to the strings in list.We first find the most similar string in the strings that have been compared. eg, "abd" is a string needed to judge. We find "abc". "abd" and "abc" differ at index 2. So ,when we compare "adb" and the strings in list, we needn't to compare the strings that differ "abc" in index before 2. eg, we needn't to compare "abd" and "ef",because "abd" differs "abc" at index 2 while "abc" differ "ef" at index 0.
当我们需要将一个新的字符串与列表中的字符串进行比较时。我们首先在被比较的字符串中找到最相似的字符串。abd是用来判断的字符串。我们发现“abc”。abd和abc在索引2中不同。所以,当我们比较“adb”和列表中的字符串时,我们不需要在2之前比较索引中与“abc”不同的字符串。我们不需要比较abd和ef,因为abd在索引2中与abc不同,而abc在索引0中与ef不同。
My idea is very rough and has many details to be considered. I think it is useful,especially in large scale problem.
我的想法很粗略,有很多细节需要考虑。我认为它是有用的,尤其是在大规模问题中。
#1
#2
2
If you need to perform this search operation a lot of times here is what I propose - hash all the strings using some hash function and then create a new array containing the sorted hashes. Then when you need to check if a string is contained in the array do a binary_search of its hash in the sorted array. This will be way more efficient then doing just std::find as proposed by als but depend on the fact you will need to perform the search operation enough times so that the speed gain makes up for the sorting overhead.
如果您需要多次执行这个搜索操作,我的建议是——使用一些散列函数对所有字符串进行散列,然后创建一个包含排序散列的新数组。然后,当您需要检查数组中是否包含字符串时,请在排序数组中对其散列进行binary_search。这将比只执行std::find更有效,但这取决于您需要执行足够多的搜索操作,以便速度的增加可以弥补排序开销。
#3
1
If the array is sorted you can use std::binary_search()
:
如果数组已排序,您可以使用std::binary_search():
std::string List[] = { "ID1", "ID10", "ID7", "ID34", "ID62" };
if (std::binary_search(std::begin(List), std::end(List), "ID7"))
{
std::cout << "found string\n";
}
If not, the use std::find()
(as already stated by Als).
如果不是,则使用std::find()(如Als所述)。
#4
1
The simplest solution would be to put the strings you're looking for into an array and use std::find_first_of
:
最简单的解决方案是将要查找的字符串放入一个数组,并使用std::find_first_of:
std::string targetList[] = { "ID5", "ID7" };
std::string searchList[] = { "ID1", "ID2", "ID3", "ID4", "ID5" };
if ( std::find_first_of( begin( searchList ), end( searchList ),
begin( targetList ), end( targetList ) )
!= end( targetList ) ) {
// found...
} else {
// not found...
}
This is not necessarily the most efficient solution, because find_first_of
makes no assumtions concerning the data. If the search list is very large and doesn't change, for example, and the target list only contains a few elements, it might be more efficient to sort the search list, and do a binary search for each element in the target list.
这并不一定是最有效的解决方案,因为find_first_of没有对数据进行任何假设。例如,如果搜索列表非常大并且没有变化,而目标列表只包含几个元素,那么对搜索列表进行排序,并对目标列表中的每个元素进行二进制搜索可能会更有效。
#5
0
I have an idea.
我有个主意。
first, we should make List sorted.just as hmjd describing.
首先,我们应该对列表进行排序。正如hmjd描述。
when comparing two strings, we can record some information.
当比较两个字符串时,我们可以记录一些信息。
For example,
例如,
table two dimenssion array dif records the index where two strings differs.
表二维数组dif记录了两个字符串不同的索引。
string[2] = {"abc","abd"}
list[5] = {"aab","abb","abc","bcd","ef"}
dif[0][0] = 1 ("abc" and "aab" differ at index 1)
dif[0][1] = 2 ("abc" and "abb" differ at index 2)
dif[0][2] = -1 ("abc" and "abc" are same, so we use -1 to represent two strings are same)
dif[0][3] = 0 ("abc" and "bcd" differ at index 0)
dif[0][4] = 0 ("abc" and "eg" differ at index 0)
when we need to compare a new string to the strings in list.We first find the most similar string in the strings that have been compared. eg, "abd" is a string needed to judge. We find "abc". "abd" and "abc" differ at index 2. So ,when we compare "adb" and the strings in list, we needn't to compare the strings that differ "abc" in index before 2. eg, we needn't to compare "abd" and "ef",because "abd" differs "abc" at index 2 while "abc" differ "ef" at index 0.
当我们需要将一个新的字符串与列表中的字符串进行比较时。我们首先在被比较的字符串中找到最相似的字符串。abd是用来判断的字符串。我们发现“abc”。abd和abc在索引2中不同。所以,当我们比较“adb”和列表中的字符串时,我们不需要在2之前比较索引中与“abc”不同的字符串。我们不需要比较abd和ef,因为abd在索引2中与abc不同,而abc在索引0中与ef不同。
My idea is very rough and has many details to be considered. I think it is useful,especially in large scale problem.
我的想法很粗略,有很多细节需要考虑。我认为它是有用的,尤其是在大规模问题中。