For the following block of code:
对于以下代码块:
For I = 0 To listOfStrings.Count - 1
If myString.Contains(lstOfStrings.Item(I)) Then
Return True
End If
Next
Return False
The output is:
输出是:
Case 1:
myString: C:\Files\myfile.doc
listOfString: C:\Files\, C:\Files2\
Result: True
Case 2:
myString: C:\Files3\myfile.doc
listOfString: C:\Files\, C:\Files2\
Result: False
The list (listOfStrings) may contain several items (minimum 20) and it has to be checked against a thousands of strings (like myString).
列表(listOfStrings)可能包含多个项目(最少20个),并且必须针对数千个字符串(如myString)进行检查。
Is there a better (more efficient) way to write this code?
是否有更好(更有效)的方法来编写此代码?
10 个解决方案
#1
With LINQ, and using C# (I don't know VB much these days):
使用LINQ,并使用C#(这些天我不太了解VB):
bool b = listOfStrings.Any(s=>myString.Contains(s));
or (shorter and more efficient, but arguably less clear):
或(更短,更有效,但可以说不太清楚):
bool b = listOfStrings.Any(myString.Contains);
If you were testing equality, it would be worth looking at HashSet
etc, but this won't help with partial matches unless you split it into fragments and add an order of complexity.
如果您正在测试相等性,那么值得查看HashSet等,但这对部分匹配没有帮助,除非您将其拆分为片段并添加复杂性顺序。
update: if you really mean "StartsWith", then you could sort the list and place it into an array ; then use Array.BinarySearch
to find each item - check by lookup to see if it is a full or partial match.
更新:如果你的意思是“StartsWith”,那么你可以对列表进行排序并将其放入数组中;然后使用Array.BinarySearch查找每个项目 - 通过查找检查它是完全匹配还是部分匹配。
#2
There were a number of suggestions from an earlier similar question "Best way to test for existing string against a large list of comparables".
从早先的类似问题“针对大量可比对象测试现有字符串的最佳方法”中提出了许多建议。
Regex might be sufficient for your requirement. The expression would be a concatenation of all the candidate substrings, with an OR "|
" operator between them. Of course, you'll have to watch out for unescaped characters when building the expression, or a failure to compile it because of complexity or size limitations.
正则表达式可能足以满足您的要求。表达式将是所有候选子串的串联,带有OR“|”他们之间的运营商当然,在构建表达式时,您必须注意未转义的字符,或者由于复杂性或大小限制而无法编译它。
Another way to do this would be to construct a trie data structure to represent all the candidate substrings (this may somewhat duplicate what the regex matcher is doing). As you step through each character in the test string, you would create a new pointer to the root of the trie, and advance existing pointers to the appropriate child (if any). You get a match when any pointer reaches a leaf.
另一种方法是构造一个trie数据结构来表示所有候选子串(这可能与正则表达式匹配器正在做的有些重复)。当您单步执行测试字符串中的每个字符时,您将创建一个指向trie根的新指针,并将现有指针前进到相应的子项(如果有)。当任何指针到达叶子时,你得到一个匹配。
#3
when you construct yours strings it should be like this
当你构造你的字符串时,它应该是这样的
bool inact = new string[] { "SUSPENDARE", "DIZOLVARE" }.Any(s=>stare.Contains(s));
#4
I liked Marc's answer, but needed the Contains matching to be CaSe InSenSiTiVe.
我喜欢Marc的答案,但需要包含匹配才能成为CaSe InSenSiTiVe。
This was the solution:
这是解决方案:
bool b = listOfStrings.Any(s => myString.IndexOf(s, StringComparison.OrdinalIgnoreCase) >= 0))
#5
Based on your patterns one improvement would be to change to using StartsWith instead of Contains. StartsWith need only iterate through each string until it finds the first mismatch instead of having to restart the search at every character position when it finds one.
根据您的模式,一个改进是改为使用StartsWith而不是Contains。 StartsWith只需迭代遍历每个字符串,直到找到第一个不匹配,而不是必须在找到每个字符位置时重新开始搜索。
Also, based on your patterns, it looks like you may be able to extract the first part of the path for myString, then reverse the comparison -- looking for the starting path of myString in the list of strings rather than the other way around.
此外,根据您的模式,看起来您可能能够提取myString路径的第一部分,然后反转比较 - 在字符串列表中查找myString的起始路径,而不是相反。
string[] pathComponents = myString.Split( Path.DirectorySeparatorChar );
string startPath = pathComponents[0] + Path.DirectorySeparatorChar;
return listOfStrings.Contains( startPath );
EDIT: This would be even faster using the HashSet idea @Marc Gravell mentions since you could change Contains
to ContainsKey
and the lookup would be O(1) instead of O(N). You would have to make sure that the paths match exactly. Note that this is not a general solution as is @Marc Gravell's but is tailored to your examples.
编辑:使用HashSet想法@Marc Gravell提及的速度更快,因为您可以将Contains更改为ContainsKey,查找将是O(1)而不是O(N)。您必须确保路径完全匹配。请注意,这不是@Marc Gravell的一般解决方案,而是根据您的示例量身定制的。
Sorry for the C# example. I haven't had enough coffee to translate to VB.
对不起C#的例子。我没有足够的咖啡来翻译成VB。
#6
I'm not sure if it's more efficient, but you could think about using at Lambda Expressions.
我不确定它是否更有效,但您可以考虑在Lambda Expressions中使用它。
#7
Have you tested the speed?
你测试过速度了吗?
i.e. Have you created a sample set of data and profiled it? It may not be as bad as you think.
即,您是否创建了一组样本数据并对其进行了分析?它可能没有您想象的那么糟糕。
This might also be something you could spawn off into a separate thread and give the illusion of speed!
这也可能是你可以产生一个单独的线程并给出速度的错觉!
#8
If speed is critical, you might want to look for the Aho-Corasick algorithm for sets of patterns.
如果速度至关重要,您可能需要为模式集寻找Aho-Corasick算法。
It's a trie with failure links, that is, complexity is O(n+m+k), where n is the length of the input text, m the cumulative length of the patterns and k the number of matches. You just have to modify the algorithm to terminate after the first match is found.
这是一个带有失败链接的特里,即复杂度为O(n + m + k),其中n是输入文本的长度,m是模式的累积长度,k是匹配的数量。您只需在找到第一个匹配项后修改算法即可终止。
#9
myList.Any(myString.Contains);
#10
The drawback of Contains
method is that it doesn't allow to specify comparison type which is often important when comparing strings. It is always culture-sensitive and case-sensitive. So I think the answer of WhoIsRich is valuable, I just want to show a simpler alternative:
Contains方法的缺点是它不允许指定比较类型,这在比较字符串时通常很重要。它总是对文化敏感且区分大小写。所以我认为WhoIsRich的答案很有价值,我只是想展示一个更简单的选择:
listOfStrings.Any(s => s.Equals(myString, StringComparison.OrdinalIgnoreCase))
#1
With LINQ, and using C# (I don't know VB much these days):
使用LINQ,并使用C#(这些天我不太了解VB):
bool b = listOfStrings.Any(s=>myString.Contains(s));
or (shorter and more efficient, but arguably less clear):
或(更短,更有效,但可以说不太清楚):
bool b = listOfStrings.Any(myString.Contains);
If you were testing equality, it would be worth looking at HashSet
etc, but this won't help with partial matches unless you split it into fragments and add an order of complexity.
如果您正在测试相等性,那么值得查看HashSet等,但这对部分匹配没有帮助,除非您将其拆分为片段并添加复杂性顺序。
update: if you really mean "StartsWith", then you could sort the list and place it into an array ; then use Array.BinarySearch
to find each item - check by lookup to see if it is a full or partial match.
更新:如果你的意思是“StartsWith”,那么你可以对列表进行排序并将其放入数组中;然后使用Array.BinarySearch查找每个项目 - 通过查找检查它是完全匹配还是部分匹配。
#2
There were a number of suggestions from an earlier similar question "Best way to test for existing string against a large list of comparables".
从早先的类似问题“针对大量可比对象测试现有字符串的最佳方法”中提出了许多建议。
Regex might be sufficient for your requirement. The expression would be a concatenation of all the candidate substrings, with an OR "|
" operator between them. Of course, you'll have to watch out for unescaped characters when building the expression, or a failure to compile it because of complexity or size limitations.
正则表达式可能足以满足您的要求。表达式将是所有候选子串的串联,带有OR“|”他们之间的运营商当然,在构建表达式时,您必须注意未转义的字符,或者由于复杂性或大小限制而无法编译它。
Another way to do this would be to construct a trie data structure to represent all the candidate substrings (this may somewhat duplicate what the regex matcher is doing). As you step through each character in the test string, you would create a new pointer to the root of the trie, and advance existing pointers to the appropriate child (if any). You get a match when any pointer reaches a leaf.
另一种方法是构造一个trie数据结构来表示所有候选子串(这可能与正则表达式匹配器正在做的有些重复)。当您单步执行测试字符串中的每个字符时,您将创建一个指向trie根的新指针,并将现有指针前进到相应的子项(如果有)。当任何指针到达叶子时,你得到一个匹配。
#3
when you construct yours strings it should be like this
当你构造你的字符串时,它应该是这样的
bool inact = new string[] { "SUSPENDARE", "DIZOLVARE" }.Any(s=>stare.Contains(s));
#4
I liked Marc's answer, but needed the Contains matching to be CaSe InSenSiTiVe.
我喜欢Marc的答案,但需要包含匹配才能成为CaSe InSenSiTiVe。
This was the solution:
这是解决方案:
bool b = listOfStrings.Any(s => myString.IndexOf(s, StringComparison.OrdinalIgnoreCase) >= 0))
#5
Based on your patterns one improvement would be to change to using StartsWith instead of Contains. StartsWith need only iterate through each string until it finds the first mismatch instead of having to restart the search at every character position when it finds one.
根据您的模式,一个改进是改为使用StartsWith而不是Contains。 StartsWith只需迭代遍历每个字符串,直到找到第一个不匹配,而不是必须在找到每个字符位置时重新开始搜索。
Also, based on your patterns, it looks like you may be able to extract the first part of the path for myString, then reverse the comparison -- looking for the starting path of myString in the list of strings rather than the other way around.
此外,根据您的模式,看起来您可能能够提取myString路径的第一部分,然后反转比较 - 在字符串列表中查找myString的起始路径,而不是相反。
string[] pathComponents = myString.Split( Path.DirectorySeparatorChar );
string startPath = pathComponents[0] + Path.DirectorySeparatorChar;
return listOfStrings.Contains( startPath );
EDIT: This would be even faster using the HashSet idea @Marc Gravell mentions since you could change Contains
to ContainsKey
and the lookup would be O(1) instead of O(N). You would have to make sure that the paths match exactly. Note that this is not a general solution as is @Marc Gravell's but is tailored to your examples.
编辑:使用HashSet想法@Marc Gravell提及的速度更快,因为您可以将Contains更改为ContainsKey,查找将是O(1)而不是O(N)。您必须确保路径完全匹配。请注意,这不是@Marc Gravell的一般解决方案,而是根据您的示例量身定制的。
Sorry for the C# example. I haven't had enough coffee to translate to VB.
对不起C#的例子。我没有足够的咖啡来翻译成VB。
#6
I'm not sure if it's more efficient, but you could think about using at Lambda Expressions.
我不确定它是否更有效,但您可以考虑在Lambda Expressions中使用它。
#7
Have you tested the speed?
你测试过速度了吗?
i.e. Have you created a sample set of data and profiled it? It may not be as bad as you think.
即,您是否创建了一组样本数据并对其进行了分析?它可能没有您想象的那么糟糕。
This might also be something you could spawn off into a separate thread and give the illusion of speed!
这也可能是你可以产生一个单独的线程并给出速度的错觉!
#8
If speed is critical, you might want to look for the Aho-Corasick algorithm for sets of patterns.
如果速度至关重要,您可能需要为模式集寻找Aho-Corasick算法。
It's a trie with failure links, that is, complexity is O(n+m+k), where n is the length of the input text, m the cumulative length of the patterns and k the number of matches. You just have to modify the algorithm to terminate after the first match is found.
这是一个带有失败链接的特里,即复杂度为O(n + m + k),其中n是输入文本的长度,m是模式的累积长度,k是匹配的数量。您只需在找到第一个匹配项后修改算法即可终止。
#9
myList.Any(myString.Contains);
#10
The drawback of Contains
method is that it doesn't allow to specify comparison type which is often important when comparing strings. It is always culture-sensitive and case-sensitive. So I think the answer of WhoIsRich is valuable, I just want to show a simpler alternative:
Contains方法的缺点是它不允许指定比较类型,这在比较字符串时通常很重要。它总是对文化敏感且区分大小写。所以我认为WhoIsRich的答案很有价值,我只是想展示一个更简单的选择:
listOfStrings.Any(s => s.Equals(myString, StringComparison.OrdinalIgnoreCase))