I have just started with Regular Expressions and was to solving this question in which the task is to check whether that username is valid. A valid username will have the following properties:
我刚开始用正则表达式来解决这个问题,在这个问题中,任务是检查用户名是否有效。一个有效的用户名将具有以下属性:
- The username can contain alphanumeric characters and/or underscores(_).
- 用户名可以包含字母数字字符和/或下划线(_)。
-
The username must start with an alphabetic character.
用户名必须以字母字符开头。
-
8<=(Username Length)<=30.
8 < =(用户名长度)< = 30。
I am using this as my reference that says
我用这个作为我的参考。
\w Matches the word characters.
\w匹配单词字符。
and I came up with a solution like this String pattern = "^\\w(\\d|\\w|_){7,29}$";
which is not the correct solution. And after searching for a while I found the correct solution is
我想出了一个解决方案这样的字符串模式= " ^ \ \ w(\ \ d | \ \ w | _){ 7,29 } $”;这不是正确的解决方案。在搜索了一段时间后,我找到了正确的解决方案。
String pattern = "^[a-zA-Z][a-zA-Z0-9_]{7,29}$";
which is pretty clear to understand.
[字符串模式= " ^[a-zA-Z]- za - z0 - 9 _]{ 7,29 } $”;这一点很清楚。
What I want to confirm is (\\w|\\d|_)
equivalent to [a-zA-Z0-9_]
or not?
我想要确认的是(\\w|\\d|_)是否等于[a-zA-Z0-9_] ?
I think they are because String pattern = "^[a-zA-z](\\w|\\d|_){7,29}$";
is accecpted for all test cases.
我认为他们因为字符串模式= " ^[a-zA-z](\ \ w | \ \ d | _){ 7,29 } $”;已为所有测试用例进行了测试。
Also, this * post has two different equivalent expressions for \\w
as answers with one upvote each, want to know which one is correct [A-Za-z\s]
or [A-Za-z0-9_]
?
另外,这个* post有两个不同的等效表达式,每个都有一个upvote,想知道哪一个是正确的[A-Za-z\s]或[A-Za-z0-9_] ?
3 个解决方案
#1
1
Yes, according to the Java summary of regular expression constructs found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html,
是的,根据在这里找到的正则表达式构造的Java总结:https://docs.oracle.com/javase/8/docs/api/java/regex/patternhtml,
\d A digit: [0-9]
\w A word character: [a-zA-Z_0-9]
So (\w|\d|_)
is equivalent to ([a-zA-Z_0-9]|[0-9]|_)
, where the extra underscore is redundant since it's included with \w
.
所以(\w|\d|_)等于([a-zA-Z_0-9]|[0-9]|_),其中额外的下划线是冗余的,因为它包含了\w。
#2
1
Okay so after thinking over this for a while and trying some different solution to the question
在思考了一会儿之后,试着用不同的方法来解决这个问题。
\w
is, in fact, equivalent to [A-Za-z0-9_]
which is also given in the official documentation. https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
实际上,在官方文件中也给出了与[A-Za-z0-9_]相同的情况。https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
not [a-zA-Z\s]
as stated in this answer.
在这个答案中没有提到[a-z -z \s]。
and as for the question String pattern = ^[a-zA-Z]\\w{7,29};
is accepted for all the test cases and seems to me the shortest answer possible.
至于问题弦模式= [a-zA-Z]\\w{7,29};所有的测试用例都被接受,在我看来,这是可能的最短的答案。
And therfore although (\\w|\\d|_)
is equivalent to [a-zA-Z0-9_]
but only using \\w
is sufficient.
而且,虽然(\\w|\\d|_)与[a-zA-Z0-9_]相当,但只使用\\w就足够了。
P.S. Always stick to official documentation when in doubt during the learning phase and not anybody's answer or tutorial anywhere. Hope this helps someone with the same doubt.
当你在学习阶段有疑问时,总是坚持正式的文档,而不是任何人的答案或教程。希望这能帮助有同样疑问的人。
Edit: Thank you @4castle @trey for your suggestions.
编辑:@4castle @trey谢谢你的建议。
#3
0
\w stands for “word character”. Exactly which characters it matches differs between regex engines.
\w代表“字字符”。它匹配的字符在regex引擎之间是不同的。
- In all engines, it will include [A-Za-z].
- 在所有的引擎中,它包括[A-Za-z]。
- In most, the underscore and digits are also included.
- 在大多数情况下,下划线和数字也包括在内。
- In some engines, word characters from other languages may also match.
- 在某些引擎中,其他语言的字符也可能匹配。
The best way to find out is to do a couple of tests with the regex engine you are using. write a test string and search by regex \w to see what it matches.
最好的方法是使用您正在使用的regex引擎进行一些测试。编写一个测试字符串并通过regex \w搜索来查看它匹配的内容。
#1
1
Yes, according to the Java summary of regular expression constructs found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html,
是的,根据在这里找到的正则表达式构造的Java总结:https://docs.oracle.com/javase/8/docs/api/java/regex/patternhtml,
\d A digit: [0-9]
\w A word character: [a-zA-Z_0-9]
So (\w|\d|_)
is equivalent to ([a-zA-Z_0-9]|[0-9]|_)
, where the extra underscore is redundant since it's included with \w
.
所以(\w|\d|_)等于([a-zA-Z_0-9]|[0-9]|_),其中额外的下划线是冗余的,因为它包含了\w。
#2
1
Okay so after thinking over this for a while and trying some different solution to the question
在思考了一会儿之后,试着用不同的方法来解决这个问题。
\w
is, in fact, equivalent to [A-Za-z0-9_]
which is also given in the official documentation. https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
实际上,在官方文件中也给出了与[A-Za-z0-9_]相同的情况。https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
not [a-zA-Z\s]
as stated in this answer.
在这个答案中没有提到[a-z -z \s]。
and as for the question String pattern = ^[a-zA-Z]\\w{7,29};
is accepted for all the test cases and seems to me the shortest answer possible.
至于问题弦模式= [a-zA-Z]\\w{7,29};所有的测试用例都被接受,在我看来,这是可能的最短的答案。
And therfore although (\\w|\\d|_)
is equivalent to [a-zA-Z0-9_]
but only using \\w
is sufficient.
而且,虽然(\\w|\\d|_)与[a-zA-Z0-9_]相当,但只使用\\w就足够了。
P.S. Always stick to official documentation when in doubt during the learning phase and not anybody's answer or tutorial anywhere. Hope this helps someone with the same doubt.
当你在学习阶段有疑问时,总是坚持正式的文档,而不是任何人的答案或教程。希望这能帮助有同样疑问的人。
Edit: Thank you @4castle @trey for your suggestions.
编辑:@4castle @trey谢谢你的建议。
#3
0
\w stands for “word character”. Exactly which characters it matches differs between regex engines.
\w代表“字字符”。它匹配的字符在regex引擎之间是不同的。
- In all engines, it will include [A-Za-z].
- 在所有的引擎中,它包括[A-Za-z]。
- In most, the underscore and digits are also included.
- 在大多数情况下,下划线和数字也包括在内。
- In some engines, word characters from other languages may also match.
- 在某些引擎中,其他语言的字符也可能匹配。
The best way to find out is to do a couple of tests with the regex engine you are using. write a test string and search by regex \w to see what it matches.
最好的方法是使用您正在使用的regex引擎进行一些测试。编写一个测试字符串并通过regex \w搜索来查看它匹配的内容。