正则表达式[a-Z]是否有效,如果是,那么它是否与[a-zA-Z]相同?

时间:2022-03-15 16:47:21

Is the regular expression [a-Z] valid and if yes then is it the same as [a-zA-Z]? Please note that in [a-Z] the a is lowercase and the Z is uppercase.

正则表达式[a-Z]是否有效,如果是,那么它是否与[a-zA-Z]相同?请注意,在[a-Z]中,a是小写,Z是大写。

Edit:

编辑:

I received some answers specifiying that while [a-Z] is not valid then [A-z] is valid (but won't be the same as [a-zA-Z]) and this is really what I was looking for. Since I wanted to know in general if it's possible to replace [a-zA-Z] with a more compact version.

我收到一些答案,指出虽然[a-Z]无效,但[A-z]有效(但不会与[a-zA-Z]相同),这正是我所寻找的。因为我想知道一般是否可以用更紧凑的版本替换[a-zA-Z]。

Thanks for all who contributed to the answer.

感谢所有为答案做出贡献的人。

7 个解决方案

#1


34  

No, a (97) is higher than Z (90). [a-Z] isn't a valid character class. However [A-z] wouldn't be equivalent either, but for a different reason. It would cover all the letters but would also include the characters between the uppercase and lowercase letters: [\]^_`.

不,a(97)高于Z(90)。 [a-Z]不是有效的字符类。然而,[A-z]也不相同,但出于不同的原因。它将覆盖所有字母,但也包括大写和小写字母之间的字符:[\] ^ _`。

#2


4  

I'm not sure about other languages' implementations, but in PHP you can do

我不确定其他语言的实现,但在PHP中你可以做到

"/[a-z]/i"

and it will case insensitive. There is probably something similar for other languages.

它会不区分大小写。对于其他语言可能有类似的东西。

#3


3  

You don't specify what language, but in general [a-Z] won't be a valid range, as in ASCII the lower-case alpha characters come after the upper-case ones. [A-z] might be a valid range (indicating all upper- and lower-cased alphas as well as the punctuation that appears between Z and a), but it might not be, depending on your particular implementation. The i flag can be added to the regex to make it case-insensitive; check your particular implementation for instructions on how to specify that flag.

您没有指定使用哪种语言,但通常[a-Z]不是有效范围,因为在ASCII中,小写字母字符位于大写字母之后。 [A-z]可能是有效范围(表示所有上限和下限字母以及Z和a之间出现的标点符号),但可能不是,具体取决于您的特定实现。可以将i标志添加到正则表达式中以使其不区分大小写;检查您的特定实现,以获取有关如何指定该标志的说明。

#4


2  

You could always try it:

你总是可以试试:

 print "ok" if "monkey" =~ /[a-Z]/;

Perl says

Perl说

Invalid [] range "a-Z" in regex; marked by <-- HERE in m/[a-Z <-- HERE ]/ at a-z.pl line 4.

#5


2  

If it's valid, it won't do what you expect.

如果它有效,它将无法达到您的预期。

The character code of Z is lower than the character code of a, so if the codes are swapped to mean the range [Z-a], it will be the same as [Z\[\\\]^_`a], i.e. it will include the characters Z and a, and the characters between.

Z的字符代码低于a的字符代码,因此如果代码被交换为意味着范围[Za],它将与[Z \ [\\\] ^ _`a]相同,即它将包括字符Z和a,以及之间的字符。

If you use [A-z] to get all upper and lower case characters, that is still not the same as [A-Za-z], it's the same as [A-Z\[\\\]^_`a-z].

如果使用[A-z]获取所有大写和小写字符,这仍然与[A-Za-z]不同,它与[A-Z \ [\\\] ^ _` a-z]相同。

#6


1  

No, it's not valid, probably because the ASCII values are not consecutive from z to A.

不,它无效,可能是因为ASCII值不是从z到A的连续值。

#7


1  

I've just fallen over this in a script (not my own).

我刚刚在一个脚本(不是我自己的)中堕落了。

It seems that grep, awk, sed accept [a-Z] based on your locale (i.e. LANG or LC_CTYPE environment variable). In POSIX, [a-Z] isn't allowed by these tools, but in some other locales (e.g. en_gb.utf8) it works, and is the same as [a-zA-Z].

似乎grep,awk,sed根据你的语言环境接受[a-Z](即LANG或LC_CTYPE环境变量)。在POSIX中,这些工具不允许[a-Z],但在某些其他语言环境(例如en_gb.utf8)中它可以工作,并且与[a-zA-Z]相同。

Yes, I've checked, it doesn't match any of _^[]`.

是的,我已经检查过,它与_ ^ []`中的任何一个都不匹配。

Given that this has taken quite some time to debug, I strongly discourage anyone from ever using [a-Z] in a regex.

鉴于这需要相当长的时间来调试,我强烈反对任何人在正则表​​达式中使用[a-Z]。

#1


34  

No, a (97) is higher than Z (90). [a-Z] isn't a valid character class. However [A-z] wouldn't be equivalent either, but for a different reason. It would cover all the letters but would also include the characters between the uppercase and lowercase letters: [\]^_`.

不,a(97)高于Z(90)。 [a-Z]不是有效的字符类。然而,[A-z]也不相同,但出于不同的原因。它将覆盖所有字母,但也包括大写和小写字母之间的字符:[\] ^ _`。

#2


4  

I'm not sure about other languages' implementations, but in PHP you can do

我不确定其他语言的实现,但在PHP中你可以做到

"/[a-z]/i"

and it will case insensitive. There is probably something similar for other languages.

它会不区分大小写。对于其他语言可能有类似的东西。

#3


3  

You don't specify what language, but in general [a-Z] won't be a valid range, as in ASCII the lower-case alpha characters come after the upper-case ones. [A-z] might be a valid range (indicating all upper- and lower-cased alphas as well as the punctuation that appears between Z and a), but it might not be, depending on your particular implementation. The i flag can be added to the regex to make it case-insensitive; check your particular implementation for instructions on how to specify that flag.

您没有指定使用哪种语言,但通常[a-Z]不是有效范围,因为在ASCII中,小写字母字符位于大写字母之后。 [A-z]可能是有效范围(表示所有上限和下限字母以及Z和a之间出现的标点符号),但可能不是,具体取决于您的特定实现。可以将i标志添加到正则表达式中以使其不区分大小写;检查您的特定实现,以获取有关如何指定该标志的说明。

#4


2  

You could always try it:

你总是可以试试:

 print "ok" if "monkey" =~ /[a-Z]/;

Perl says

Perl说

Invalid [] range "a-Z" in regex; marked by <-- HERE in m/[a-Z <-- HERE ]/ at a-z.pl line 4.

#5


2  

If it's valid, it won't do what you expect.

如果它有效,它将无法达到您的预期。

The character code of Z is lower than the character code of a, so if the codes are swapped to mean the range [Z-a], it will be the same as [Z\[\\\]^_`a], i.e. it will include the characters Z and a, and the characters between.

Z的字符代码低于a的字符代码,因此如果代码被交换为意味着范围[Za],它将与[Z \ [\\\] ^ _`a]相同,即它将包括字符Z和a,以及之间的字符。

If you use [A-z] to get all upper and lower case characters, that is still not the same as [A-Za-z], it's the same as [A-Z\[\\\]^_`a-z].

如果使用[A-z]获取所有大写和小写字符,这仍然与[A-Za-z]不同,它与[A-Z \ [\\\] ^ _` a-z]相同。

#6


1  

No, it's not valid, probably because the ASCII values are not consecutive from z to A.

不,它无效,可能是因为ASCII值不是从z到A的连续值。

#7


1  

I've just fallen over this in a script (not my own).

我刚刚在一个脚本(不是我自己的)中堕落了。

It seems that grep, awk, sed accept [a-Z] based on your locale (i.e. LANG or LC_CTYPE environment variable). In POSIX, [a-Z] isn't allowed by these tools, but in some other locales (e.g. en_gb.utf8) it works, and is the same as [a-zA-Z].

似乎grep,awk,sed根据你的语言环境接受[a-Z](即LANG或LC_CTYPE环境变量)。在POSIX中,这些工具不允许[a-Z],但在某些其他语言环境(例如en_gb.utf8)中它可以工作,并且与[a-zA-Z]相同。

Yes, I've checked, it doesn't match any of _^[]`.

是的,我已经检查过,它与_ ^ []`中的任何一个都不匹配。

Given that this has taken quite some time to debug, I strongly discourage anyone from ever using [a-Z] in a regex.

鉴于这需要相当长的时间来调试,我强烈反对任何人在正则表​​达式中使用[a-Z]。