Regex: do /w的意思是[a-zA-Z]或[a-zA-Z0-9_],就像大多数教程提到的\w -匹配单词字符一样?

时间:2022-06-29 21:15:48

I have just started with Regular Expressions and was to solving this question in which the task is to check whether that username is valid. A valid username will have the following properties:

我刚开始用正则表达式来解决这个问题,在这个问题中,任务是检查用户名是否有效。一个有效的用户名将具有以下属性:

  1. The username can contain alphanumeric characters and/or underscores(_).
  2. 用户名可以包含字母数字字符和/或下划线(_)。
  3. The username must start with an alphabetic character.

    用户名必须以字母字符开头。

  4. 8<=(Username Length)<=30.

    8 < =(用户名长度)< = 30。

I am using this as my reference that says

我用这个作为我的参考。

\w Matches the word characters.

\w匹配单词字符。

and I came up with a solution like this String pattern = "^\\w(\\d|\\w|_){7,29}$"; which is not the correct solution. And after searching for a while I found the correct solution is

我想出了一个解决方案这样的字符串模式= " ^ \ \ w(\ \ d | \ \ w | _){ 7,29 } $”;这不是正确的解决方案。在搜索了一段时间后,我找到了正确的解决方案。

String pattern = "^[a-zA-Z][a-zA-Z0-9_]{7,29}$"; which is pretty clear to understand.

[字符串模式= " ^[a-zA-Z]- za - z0 - 9 _]{ 7,29 } $”;这一点很清楚。

What I want to confirm is (\\w|\\d|_) equivalent to [a-zA-Z0-9_] or not?

我想要确认的是(\\w|\\d|_)是否等于[a-zA-Z0-9_] ?

I think they are because String pattern = "^[a-zA-z](\\w|\\d|_){7,29}$"; is accecpted for all test cases.

我认为他们因为字符串模式= " ^[a-zA-z](\ \ w | \ \ d | _){ 7,29 } $”;已为所有测试用例进行了测试。

Also, this * post has two different equivalent expressions for \\w as answers with one upvote each, want to know which one is correct [A-Za-z\s] or [A-Za-z0-9_] ?

另外,这个* post有两个不同的等效表达式,每个都有一个upvote,想知道哪一个是正确的[A-Za-z\s]或[A-Za-z0-9_] ?

3 个解决方案

#1


1  

Yes, according to the Java summary of regular expression constructs found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html,

是的,根据在这里找到的正则表达式构造的Java总结:https://docs.oracle.com/javase/8/docs/api/java/regex/patternhtml,

\d  A digit: [0-9]
\w  A word character: [a-zA-Z_0-9]

So (\w|\d|_) is equivalent to ([a-zA-Z_0-9]|[0-9]|_), where the extra underscore is redundant since it's included with \w.

所以(\w|\d|_)等于([a-zA-Z_0-9]|[0-9]|_),其中额外的下划线是冗余的,因为它包含了\w。

#2


1  

Okay so after thinking over this for a while and trying some different solution to the question

在思考了一会儿之后,试着用不同的方法来解决这个问题。

\w is, in fact, equivalent to [A-Za-z0-9_] which is also given in the official documentation. https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

实际上,在官方文件中也给出了与[A-Za-z0-9_]相同的情况。https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

not [a-zA-Z\s] as stated in this answer.

在这个答案中没有提到[a-z -z \s]。

and as for the question String pattern = ^[a-zA-Z]\\w{7,29}; is accepted for all the test cases and seems to me the shortest answer possible.

至于问题弦模式= [a-zA-Z]\\w{7,29};所有的测试用例都被接受,在我看来,这是可能的最短的答案。

And therfore although (\\w|\\d|_) is equivalent to [a-zA-Z0-9_] but only using \\w is sufficient.

而且,虽然(\\w|\\d|_)与[a-zA-Z0-9_]相当,但只使用\\w就足够了。

P.S. Always stick to official documentation when in doubt during the learning phase and not anybody's answer or tutorial anywhere. Hope this helps someone with the same doubt.

当你在学习阶段有疑问时,总是坚持正式的文档,而不是任何人的答案或教程。希望这能帮助有同样疑问的人。

Edit: Thank you @4castle @trey for your suggestions.

编辑:@4castle @trey谢谢你的建议。

#3


0  

\w stands for “word character”. Exactly which characters it matches differs between regex engines.

\w代表“字字符”。它匹配的字符在regex引擎之间是不同的。

  1. In all engines, it will include [A-Za-z].
  2. 在所有的引擎中,它包括[A-Za-z]。
  3. In most, the underscore and digits are also included.
  4. 在大多数情况下,下划线和数字也包括在内。
  5. In some engines, word characters from other languages may also match.
  6. 在某些引擎中,其他语言的字符也可能匹配。

The best way to find out is to do a couple of tests with the regex engine you are using. write a test string and search by regex \w to see what it matches.

最好的方法是使用您正在使用的regex引擎进行一些测试。编写一个测试字符串并通过regex \w搜索来查看它匹配的内容。

#1


1  

Yes, according to the Java summary of regular expression constructs found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html,

是的,根据在这里找到的正则表达式构造的Java总结:https://docs.oracle.com/javase/8/docs/api/java/regex/patternhtml,

\d  A digit: [0-9]
\w  A word character: [a-zA-Z_0-9]

So (\w|\d|_) is equivalent to ([a-zA-Z_0-9]|[0-9]|_), where the extra underscore is redundant since it's included with \w.

所以(\w|\d|_)等于([a-zA-Z_0-9]|[0-9]|_),其中额外的下划线是冗余的,因为它包含了\w。

#2


1  

Okay so after thinking over this for a while and trying some different solution to the question

在思考了一会儿之后,试着用不同的方法来解决这个问题。

\w is, in fact, equivalent to [A-Za-z0-9_] which is also given in the official documentation. https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

实际上,在官方文件中也给出了与[A-Za-z0-9_]相同的情况。https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

not [a-zA-Z\s] as stated in this answer.

在这个答案中没有提到[a-z -z \s]。

and as for the question String pattern = ^[a-zA-Z]\\w{7,29}; is accepted for all the test cases and seems to me the shortest answer possible.

至于问题弦模式= [a-zA-Z]\\w{7,29};所有的测试用例都被接受,在我看来,这是可能的最短的答案。

And therfore although (\\w|\\d|_) is equivalent to [a-zA-Z0-9_] but only using \\w is sufficient.

而且,虽然(\\w|\\d|_)与[a-zA-Z0-9_]相当,但只使用\\w就足够了。

P.S. Always stick to official documentation when in doubt during the learning phase and not anybody's answer or tutorial anywhere. Hope this helps someone with the same doubt.

当你在学习阶段有疑问时,总是坚持正式的文档,而不是任何人的答案或教程。希望这能帮助有同样疑问的人。

Edit: Thank you @4castle @trey for your suggestions.

编辑:@4castle @trey谢谢你的建议。

#3


0  

\w stands for “word character”. Exactly which characters it matches differs between regex engines.

\w代表“字字符”。它匹配的字符在regex引擎之间是不同的。

  1. In all engines, it will include [A-Za-z].
  2. 在所有的引擎中,它包括[A-Za-z]。
  3. In most, the underscore and digits are also included.
  4. 在大多数情况下,下划线和数字也包括在内。
  5. In some engines, word characters from other languages may also match.
  6. 在某些引擎中,其他语言的字符也可能匹配。

The best way to find out is to do a couple of tests with the regex engine you are using. write a test string and search by regex \w to see what it matches.

最好的方法是使用您正在使用的regex引擎进行一些测试。编写一个测试字符串并通过regex \w搜索来查看它匹配的内容。