在正则表达式中只匹配字母的最佳方式是什么?

时间:2022-09-20 12:16:32

I would really like to use \w but it also matches underscores so I'm going with [A-Za-z] which feels unnecessarily verbose and America centric. Is there a better way to do this? Something like [\w^_] (I doubt I got that syntax right)?

我真的很想使用\w,但是它也匹配下划线,所以我要使用[A-Za-z],这让人觉得不必要的冗长和美国的中心。有更好的方法吗?类似(\ w ^ _)(我怀疑我有语法正确的)?

7 个解决方案

#1


7  

You could use /[a-z]/i or /[[:alpha:]]/ just as well. In fact, \w includes numbers so that won't even work.

你可以使用/[a-z]/i或/[[:alpha:]]/也可以。事实上,\w包含了数字,这样就不能用了。

#2


15  

Perhaps you mean /[[:alpha:]]/? See perlre for the discussion of POSIX character classes.

也许你的意思是[[:α:]]/ ?有关POSIX字符类的讨论,请参见perlre。

#3


11  

Just use \p{L} which means "any Unicode letter" and works in Perl (/\p{L}/). You probably need to use utf8;.

只需使用\p{L},它表示“任何Unicode字母”,并在Perl中工作(/\p{L}/)。您可能需要使用utf8;。

#4


8  

Matching international (i.e non-ASCII) characters is kind of tough, and could depend on a lot of things. Check out this example:

匹配国际(我。e非ascii字符有点难,可以依赖于很多东西。看看这个例子:

#!perl -w

use strict;
use utf8;

my $string = "ä";

print "matched :alpha:\n"  if $string =~ /[[:alpha:]]/;
print "matched ^\\W0-9_\n" if $string =~ /[^\W0-9_]/;
print "matched [a-zA-Z]\n" if $string =~ /[a-zA-Z]/;
print "matched [a-z]i\n"   if $string =~ /[a-z]/i;
print "matched [A-z]\n"    if $string =~ /[A-z]/;

For me this results in

对我来说,这导致

matched :alpha:

If you remove the use utf8 then none of the regular expressions match.

如果删除了use utf8,那么所有正则表达式都不匹配。

Looking at this very relevant question, it looks like you probably want to use utf8 and check out Unicode::Semantics.

看看这个非常相关的问题,您可能希望使用utf8并检查Unicode::语义。

Of course, if you're using straight ASCII characters than any of the aforementioned regular expressions will work.

当然,如果您使用的是直接ASCII字符,而不是前面提到的任何正则表达式。

#5


6  

[^\W0-9_]

# or

[[:alpha:]]

See perldoc perlre

看到perldoc perlre

#6


4  

A few options:

几个选项:

1. /[a-z]/i               # case insensitive
2. /[A-Z]/i               # case insensitive
3. /[A-z]/                # explicit range listing (capital 'A' to lowercase 'z')
4. /[[:alpha:]]/          # POSIX alpha character class

I recommend using either the case-insensitive, or the true way /[a-zA-z]/, unless you have a certain language preference in mind.

我建议使用不区分大小写的方法,或者使用真正的方法/[a- za -z]/,除非你有特定的语言偏好。

Note:

注意:

  • Number 3 requires the capital 'A' first and then lowercase 'z' because of the order of the ASCII values; it does not work if you do the reverse: a-Z. Also: this method would fail the no-underscore criteria, since it includes [ \ ] ^ _ ` .
  • 数字3需要大写的A,然后小写的z,因为ASCII值的顺序;如果你反过来做,它就不起作用:a-Z。也:此方法会失败没有下划线的标准,因为它包括[\]^ _”。
  • Number 4 will match on those additional language characters, but it also matches on:
    ʹʺʻˍˎˏːˑˬˮ̀́   (plus many others)
  • 4号将对那些额外语言字符匹配,但它也匹配:ʹʺʻˍˎˏːˑˬˮ̀́(以及其他)

#7


1  

you're looking for internationalization in your regex? then you'll need to do something like this guy did: JavaScript validation issue with international characters

您正在您的regex中寻找国际化?然后您需要做一些类似这个人所做的事情:使用国际字符进行JavaScript验证问题

explicitly match on all of the moon language letters :)

明确匹配所有的月亮语言字母:)

#1


7  

You could use /[a-z]/i or /[[:alpha:]]/ just as well. In fact, \w includes numbers so that won't even work.

你可以使用/[a-z]/i或/[[:alpha:]]/也可以。事实上,\w包含了数字,这样就不能用了。

#2


15  

Perhaps you mean /[[:alpha:]]/? See perlre for the discussion of POSIX character classes.

也许你的意思是[[:α:]]/ ?有关POSIX字符类的讨论,请参见perlre。

#3


11  

Just use \p{L} which means "any Unicode letter" and works in Perl (/\p{L}/). You probably need to use utf8;.

只需使用\p{L},它表示“任何Unicode字母”,并在Perl中工作(/\p{L}/)。您可能需要使用utf8;。

#4


8  

Matching international (i.e non-ASCII) characters is kind of tough, and could depend on a lot of things. Check out this example:

匹配国际(我。e非ascii字符有点难,可以依赖于很多东西。看看这个例子:

#!perl -w

use strict;
use utf8;

my $string = "ä";

print "matched :alpha:\n"  if $string =~ /[[:alpha:]]/;
print "matched ^\\W0-9_\n" if $string =~ /[^\W0-9_]/;
print "matched [a-zA-Z]\n" if $string =~ /[a-zA-Z]/;
print "matched [a-z]i\n"   if $string =~ /[a-z]/i;
print "matched [A-z]\n"    if $string =~ /[A-z]/;

For me this results in

对我来说,这导致

matched :alpha:

If you remove the use utf8 then none of the regular expressions match.

如果删除了use utf8,那么所有正则表达式都不匹配。

Looking at this very relevant question, it looks like you probably want to use utf8 and check out Unicode::Semantics.

看看这个非常相关的问题,您可能希望使用utf8并检查Unicode::语义。

Of course, if you're using straight ASCII characters than any of the aforementioned regular expressions will work.

当然,如果您使用的是直接ASCII字符,而不是前面提到的任何正则表达式。

#5


6  

[^\W0-9_]

# or

[[:alpha:]]

See perldoc perlre

看到perldoc perlre

#6


4  

A few options:

几个选项:

1. /[a-z]/i               # case insensitive
2. /[A-Z]/i               # case insensitive
3. /[A-z]/                # explicit range listing (capital 'A' to lowercase 'z')
4. /[[:alpha:]]/          # POSIX alpha character class

I recommend using either the case-insensitive, or the true way /[a-zA-z]/, unless you have a certain language preference in mind.

我建议使用不区分大小写的方法,或者使用真正的方法/[a- za -z]/,除非你有特定的语言偏好。

Note:

注意:

  • Number 3 requires the capital 'A' first and then lowercase 'z' because of the order of the ASCII values; it does not work if you do the reverse: a-Z. Also: this method would fail the no-underscore criteria, since it includes [ \ ] ^ _ ` .
  • 数字3需要大写的A,然后小写的z,因为ASCII值的顺序;如果你反过来做,它就不起作用:a-Z。也:此方法会失败没有下划线的标准,因为它包括[\]^ _”。
  • Number 4 will match on those additional language characters, but it also matches on:
    ʹʺʻˍˎˏːˑˬˮ̀́   (plus many others)
  • 4号将对那些额外语言字符匹配,但它也匹配:ʹʺʻˍˎˏːˑˬˮ̀́(以及其他)

#7


1  

you're looking for internationalization in your regex? then you'll need to do something like this guy did: JavaScript validation issue with international characters

您正在您的regex中寻找国际化?然后您需要做一些类似这个人所做的事情:使用国际字符进行JavaScript验证问题

explicitly match on all of the moon language letters :)

明确匹配所有的月亮语言字母:)