I'm completely incapable of regular expressions, and so I need some help with a problem that I think would best be solved by using regular expressions.
我完全不能使用正则表达式,所以我需要一些帮助来解决我认为最好使用正则表达式来解决的问题。
I have list of strings in C#:
我在c#中有字符串列表:
List<string> lstNames = new List<string>();
lstNames.add("TRA-94:23");
lstNames.add("TRA-42:101");
lstNames.add("TRA-109:AD");
foreach (string n in lstNames) {
// logic goes here that somehow uses regex to remove all special characters
string regExp = "NO_IDEA";
string tmp = Regex.Replace(n, regExp, "");
}
I need to be able to loop over the list and return each item without any special characters. For example, item one would be "TRA9423", item two would be "TRA42101" and item three would be TRA109AD.
我需要能够对列表进行循环,并返回没有任何特殊字符的项。例如,第一项是“TRA9423”,第二项是“TRA42101”,第三项是TRA109AD。
Is there a regular expression that can accomplish this for me?
有没有一个正则表达式可以帮我完成这个?
Also, the list contains more than 4000 items, so I need the search and replace to be efficient and quick if possible.
另外,这个列表包含4000多个条目,所以我需要搜索和替换,如果可能的话,要高效和快速。
EDIT: I should have specified that any character beside a-z, A-Z and 0-9 is special in my circumstance.
编辑:我应该指定除了a-z、a-z和0-9之外的任何字符在我的情况下都是特殊的。
8 个解决方案
#1
95
It really depends on your definition of special characters. I find that a whitelist rather than a blacklist is the best approach in most situations:
这真的取决于你对特殊字符的定义。我发现,在大多数情况下,最好的办法是使用白名单而不是黑名单:
tmp = Regex.Replace(n, "[^0-9a-zA-Z]+", "");
You should be careful with your current approach because the following two items will be converted to the same string and will therefore be indistinguishable:
您应该注意您当前的方法,因为以下两项将被转换为相同的字符串,因此无法区分:
"TRA-12:123"
"TRA-121:23"
#2
16
This should do it:
这应该这样做:
[^a-zA-Z0-9]
Basically it matches all non-alphanumeric characters.
基本上它匹配所有非字母数字字符。
#3
16
[^a-zA-Z0-9]
is a character class matches any non-alphanumeric characters.
[^ a-zA-Z0-9]是一个字符类匹配任何非字母数字字符。
Alternatively, [^\w\d]
does the same thing.
另外,[^ \ w \ d]做同样的事。
Usage:
用法:
string regExp = "[^\w\d]";
string tmp = Regex.Replace(n, regExp, "");
#4
7
You can use:
您可以使用:
string regExp = "\\W";
This is equivalent to Daniel's "[^a-zA-Z0-9]
"
这相当于丹尼尔的”[^ a-zA-Z0-9)”
\W matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]
.
\W匹配任何非词字符。相当于Unicode类别[^ \ p {你} \ p {陆} \ p { Lt } \ p { Lo } \ p {和} \ p {电脑}]。
#5
3
Depending on your definition of "special character", I think "[^a-zA-Z0-9]" would probably do the trick. That would find anything that is not a small letter, a capital letter, or a digit.
取决于你的定义“特殊字符”,我认为“[^ a-zA-Z0-9]”可能会奏效。它会找到任何不是小写字母、大写字母或数字的东西。
#6
2
tmp = Regex.Replace(n, @"\W+", "");
\w
matches letters, digits, and underscores, \W
is the negated version.
\w匹配字母、数字和下划线,\w是否定的版本。
#7
2
For my purposes I wanted all English ASCII chars, so this worked.
出于我的目的,我想要所有的英语ASCII字符,所以这是可行的。
html = Regex.Replace(html, "[^\x00-\x80]+", "")
#8
0
If you don't want to use Regex then another option is to use
如果您不想使用Regex,那么另一个选项是使用
char.IsLetterOrDigit
You can use this to loop through each char of the string and only return if true.
您可以使用这个循环遍历字符串的每个字符,并仅在为true时返回。
#1
95
It really depends on your definition of special characters. I find that a whitelist rather than a blacklist is the best approach in most situations:
这真的取决于你对特殊字符的定义。我发现,在大多数情况下,最好的办法是使用白名单而不是黑名单:
tmp = Regex.Replace(n, "[^0-9a-zA-Z]+", "");
You should be careful with your current approach because the following two items will be converted to the same string and will therefore be indistinguishable:
您应该注意您当前的方法,因为以下两项将被转换为相同的字符串,因此无法区分:
"TRA-12:123"
"TRA-121:23"
#2
16
This should do it:
这应该这样做:
[^a-zA-Z0-9]
Basically it matches all non-alphanumeric characters.
基本上它匹配所有非字母数字字符。
#3
16
[^a-zA-Z0-9]
is a character class matches any non-alphanumeric characters.
[^ a-zA-Z0-9]是一个字符类匹配任何非字母数字字符。
Alternatively, [^\w\d]
does the same thing.
另外,[^ \ w \ d]做同样的事。
Usage:
用法:
string regExp = "[^\w\d]";
string tmp = Regex.Replace(n, regExp, "");
#4
7
You can use:
您可以使用:
string regExp = "\\W";
This is equivalent to Daniel's "[^a-zA-Z0-9]
"
这相当于丹尼尔的”[^ a-zA-Z0-9)”
\W matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]
.
\W匹配任何非词字符。相当于Unicode类别[^ \ p {你} \ p {陆} \ p { Lt } \ p { Lo } \ p {和} \ p {电脑}]。
#5
3
Depending on your definition of "special character", I think "[^a-zA-Z0-9]" would probably do the trick. That would find anything that is not a small letter, a capital letter, or a digit.
取决于你的定义“特殊字符”,我认为“[^ a-zA-Z0-9]”可能会奏效。它会找到任何不是小写字母、大写字母或数字的东西。
#6
2
tmp = Regex.Replace(n, @"\W+", "");
\w
matches letters, digits, and underscores, \W
is the negated version.
\w匹配字母、数字和下划线,\w是否定的版本。
#7
2
For my purposes I wanted all English ASCII chars, so this worked.
出于我的目的,我想要所有的英语ASCII字符,所以这是可行的。
html = Regex.Replace(html, "[^\x00-\x80]+", "")
#8
0
If you don't want to use Regex then another option is to use
如果您不想使用Regex,那么另一个选项是使用
char.IsLetterOrDigit
You can use this to loop through each char of the string and only return if true.
您可以使用这个循环遍历字符串的每个字符,并仅在为true时返回。