正则表达式”。*[^ a-zA-Z0-9_]。*”

时间:2021-10-15 17:01:45

As I am trying to read more about regular expressions in C#, I just want to make sure of my conclusion that I made. for the following expression ".*[^a-zA-Z0-9_].* ", the " .* " at the beginning and end are useless, is that right ? because as I understood, that ".*" means zero or more occurrence of any character, but being followed by "[^a-zA-Z0-9_]" which means any character other than any combination of letters and digits case insensitive, makes ".*" useless to be added before and after "[^a-zA-Z0-9_]", is that right ?

当我试着阅读更多关于c#中的正则表达式时,我只是想确定我的结论。以下表达式”。*[^ a-zA-Z0-9_]。在开头和结尾都是无用的,对吗?因为正如我所理解的那样。*“是指任何字符的零或更多的出现,但后面跟着的是”[a-z - z0 -9_]”,意思是除了字母和数字大小写不敏感的字符以外的任何字符。"在"[a-zA-Z0-9_]"之前和之后都不能添加,对吗?

Here is the code I am using to check if the expressions matches

这是我用来检查表达式是否匹配的代码。

// Here we call Regex.Match.
Match match = Regex.Match("anytest#", ".*[^a-z A-Z0-9_].*");
//Match match = Regex.Match("anytest#", "[^a-z A-Z0-9_]");

// Here we check the Match instance.
if (match.Success)
    Console.WriteLine("error");
else
    Console.WriteLine("no error");

5 个解决方案

#1


1  

.*[^a-zA-Z0-9_].* will match the entire input as long as there is a non-alphanumeric/underscore somewhere in the input. [^a-zA-Z0-9_] will match only a single non-alphanumeric/underscore character (most likely the last one, if you're using the default greedy matching) if it is somewhere in the input. Which one you want depends on the input and what you want to do once you find out if a non-alphanumeric/underscore character exists in the input.

(^ . * a-zA-Z0-9_]。*将匹配整个输入,只要在输入中有一个非字母数字/下划线。[^ a-zA-Z0-9_)将只匹配一个非字母数字/下划线字符(很可能是最后一个,如果你使用默认贪婪匹配),如果是在输入。你想要哪一个取决于输入和你想要做什么一旦你发现输入中是否存在一个非字母数字/下划线字符。

#2


2  

The only difference would be whether the "margin characters" will be included in the result or not.

唯一的区别是“边缘字符”是否会被包含在结果中。

For:

:

ab41--_71j

ab41——_71j

It will match:

它将匹配:

1--_7

1——_7

And without the .* at beginning and end it will match:

在开始和结束时,它将会匹配:

--_

——_

Any string will match the .*[^a-zA-Z0-9_].* regex at least once as long as it has at least one character that isn't a-zA-Z0-9_

任何字符串匹配。*[^ a-zA-Z0-9_]。只要至少有一个角色不是a-zA-Z0-9_,就至少有一次regex。

From your currently last comment in your answer, I understand that you actually use:

从你最近的回答中,我知道你实际上在使用:

^[a-zA-Z0-9]*$

^[a-zA-Z0-9]*美元

This will match only if all characters are digit/letters. If it doesn't match, then the string is invalid.

只有当所有字符都是数字/字母时,才会匹配。如果它不匹配,则该字符串无效。

If you also want to allow the _ character, then use:

如果您还想允许_字符,则使用:

^[a-zA-Z0-9_]*$

[^ - za - z0 - 9 _]*美元

Which can even be shortened to:

甚至可以缩短为:

^\w$

^ \ w美元

In general, it is better to make regex's Validate rather than Invalidate strings. It just makes more sense and is more intuitive.

一般来说,最好是使regex的验证而不是使字符串无效。它更有意义,更直观。

So my validation would look like:

所以我的验证应该是:

if (Regex.IsMatch("anytest#", "^\\w$"))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

Another option that is probably faster:

另一个可能更快的选择是

if ("anytest#".ToCharArray().All(c => char.IsLetterOrDigit(c) || c == '_'))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

And if you don't want '_' to be included, it can even look nicer;

如果你不想被包括在内,它甚至可以看起来更好;

if ("anytest#".ToCharArray().All(char.IsLetterOrDigit))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

#3


1  

No, because there are other characters than a-Z and 0-9.

不,因为还有其他的字符比a-Z和0-9。

That regex matches all strings that start with any characters followed not by a-zA-Z0-9 and end with any characters. Or just a string that does not contain a-zA-Z0-9 at all.

该regex匹配所有以字符开头的字符串,而不是以a-zA-Z0-9结尾,以任何字符结尾。或者只是一个不包含a- za - z0 -9的字符串。

If you leave the .* then you just have a regex that matches a charatcer that does not contain a-zA-Z0-9 at all.

如果你离开了。*那么你就有一个regex,它匹配一个不包含a- za - z0 -9的charatcer。

.*[^a-zA-Z0-9_].*  matches for instance: ABC_ß_ABC
[^a-zA-Z0-9_]      matches for instance: ß   (and this regex just matches 1 character)

#4


1  

Input 1 : ABC_ß_ABC

输入1:ABC_ß_ABC

Input 2 : ß

输入2:ß

Regex 1: .*[^a-zA-Z0-9_].* Regex 2: [^a-zA-Z0-9_]

正则表达式1:. *[^ a-zA-Z0-9_]。*正则表达式2:[^ a-zA-Z0-9_]

Both the inputs match both the regex,

两个输入都匹配regex,

For input 1

输入1

Regex 1 matches 9 characters

Regex 1匹配9个字符。

Regex 2 matches only 1 character

Regex 2只匹配1个字符。

#5


1  

Only include those tokens in the Regex that you are actually looking for. In your case you didn't actually care whether there are any other characters before or after the excluding character class you specified. Adding .* before and after that doesn't change the success of the match, but makes matching more complicated. A Regex matches anywhere already, unless you specifically anchor it somehow, e.g. using ^ at the start.

只在您实际需要的Regex中包含这些令牌。在您的案例中,您实际上并不关心您指定的排除字符类之前或之后是否有其他字符。在这之前和之后都不会改变匹配的成功,但是会使匹配变得更加复杂。正则表达式匹配任何地方,除非你特别锚,例如使用^。

#1


1  

.*[^a-zA-Z0-9_].* will match the entire input as long as there is a non-alphanumeric/underscore somewhere in the input. [^a-zA-Z0-9_] will match only a single non-alphanumeric/underscore character (most likely the last one, if you're using the default greedy matching) if it is somewhere in the input. Which one you want depends on the input and what you want to do once you find out if a non-alphanumeric/underscore character exists in the input.

(^ . * a-zA-Z0-9_]。*将匹配整个输入,只要在输入中有一个非字母数字/下划线。[^ a-zA-Z0-9_)将只匹配一个非字母数字/下划线字符(很可能是最后一个,如果你使用默认贪婪匹配),如果是在输入。你想要哪一个取决于输入和你想要做什么一旦你发现输入中是否存在一个非字母数字/下划线字符。

#2


2  

The only difference would be whether the "margin characters" will be included in the result or not.

唯一的区别是“边缘字符”是否会被包含在结果中。

For:

:

ab41--_71j

ab41——_71j

It will match:

它将匹配:

1--_7

1——_7

And without the .* at beginning and end it will match:

在开始和结束时,它将会匹配:

--_

——_

Any string will match the .*[^a-zA-Z0-9_].* regex at least once as long as it has at least one character that isn't a-zA-Z0-9_

任何字符串匹配。*[^ a-zA-Z0-9_]。只要至少有一个角色不是a-zA-Z0-9_,就至少有一次regex。

From your currently last comment in your answer, I understand that you actually use:

从你最近的回答中,我知道你实际上在使用:

^[a-zA-Z0-9]*$

^[a-zA-Z0-9]*美元

This will match only if all characters are digit/letters. If it doesn't match, then the string is invalid.

只有当所有字符都是数字/字母时,才会匹配。如果它不匹配,则该字符串无效。

If you also want to allow the _ character, then use:

如果您还想允许_字符,则使用:

^[a-zA-Z0-9_]*$

[^ - za - z0 - 9 _]*美元

Which can even be shortened to:

甚至可以缩短为:

^\w$

^ \ w美元

In general, it is better to make regex's Validate rather than Invalidate strings. It just makes more sense and is more intuitive.

一般来说,最好是使regex的验证而不是使字符串无效。它更有意义,更直观。

So my validation would look like:

所以我的验证应该是:

if (Regex.IsMatch("anytest#", "^\\w$"))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

Another option that is probably faster:

另一个可能更快的选择是

if ("anytest#".ToCharArray().All(c => char.IsLetterOrDigit(c) || c == '_'))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

And if you don't want '_' to be included, it can even look nicer;

如果你不想被包括在内,它甚至可以看起来更好;

if ("anytest#".ToCharArray().All(char.IsLetterOrDigit))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

#3


1  

No, because there are other characters than a-Z and 0-9.

不,因为还有其他的字符比a-Z和0-9。

That regex matches all strings that start with any characters followed not by a-zA-Z0-9 and end with any characters. Or just a string that does not contain a-zA-Z0-9 at all.

该regex匹配所有以字符开头的字符串,而不是以a-zA-Z0-9结尾,以任何字符结尾。或者只是一个不包含a- za - z0 -9的字符串。

If you leave the .* then you just have a regex that matches a charatcer that does not contain a-zA-Z0-9 at all.

如果你离开了。*那么你就有一个regex,它匹配一个不包含a- za - z0 -9的charatcer。

.*[^a-zA-Z0-9_].*  matches for instance: ABC_ß_ABC
[^a-zA-Z0-9_]      matches for instance: ß   (and this regex just matches 1 character)

#4


1  

Input 1 : ABC_ß_ABC

输入1:ABC_ß_ABC

Input 2 : ß

输入2:ß

Regex 1: .*[^a-zA-Z0-9_].* Regex 2: [^a-zA-Z0-9_]

正则表达式1:. *[^ a-zA-Z0-9_]。*正则表达式2:[^ a-zA-Z0-9_]

Both the inputs match both the regex,

两个输入都匹配regex,

For input 1

输入1

Regex 1 matches 9 characters

Regex 1匹配9个字符。

Regex 2 matches only 1 character

Regex 2只匹配1个字符。

#5


1  

Only include those tokens in the Regex that you are actually looking for. In your case you didn't actually care whether there are any other characters before or after the excluding character class you specified. Adding .* before and after that doesn't change the success of the match, but makes matching more complicated. A Regex matches anywhere already, unless you specifically anchor it somehow, e.g. using ^ at the start.

只在您实际需要的Regex中包含这些令牌。在您的案例中,您实际上并不关心您指定的排除字符类之前或之后是否有其他字符。在这之前和之后都不会改变匹配的成功,但是会使匹配变得更加复杂。正则表达式匹配任何地方,除非你特别锚,例如使用^。