Digits are optional, and are only allowed in the end of a word
数字是可选的,并且只允许在单词的末尾
Spaces are optional, and are only allowed in the middle of a word.
空格是可选的,只允许在一个单词中间。
I am pretty much just trying to match the possible months in a few languages, say English and Vietnamese
我只是试着用几种语言来匹配可能的月份,比如英语和越南语
For example, the following are valid matches:
例如,以下是有效匹配:
'June'
'tháng 6'
“6月”“thang 6”
But the following are not because of space: 'June '
' June'
但以下这些并不是因为篇幅所限:“六月”“六月”
This is my testcases: https://regex101.com/r/pZ0mN3/2.
这是我的测试用例:https://regex101.com/r/pZ0mN3/2。
As you can see, I came up with ^\S[\S ]+\S$
which is kind of working, but I wonder if there's a better way to do it.
正如你所看到的,我想出^ \[\ S]+ \年代美元的工作,但是我想知道如果有一个更好的方法去做。
1 个解决方案
#1
2
To match a string with no leading and trailing spaces in the JavaScript regex flavor, you can use several options:
要匹配JavaScript regex风格中没有前导和尾随空格的字符串,可以使用以下几个选项:
-
Require the first and the last non-whitespace character with
\S
(=[^\s]
). This can be done with, say,^\S[\S\s]*\S$
. This regex requires at least 2 characters to be in the string. Your regex requires 3 chars in the input since you used+
. It won't allow some Unicode whitespaces either.需要第一个和最后一个非空字符\ S(= ^ \[S])。说,这可以用^ \[\ S \ S]* \年代美元。这个regex要求字符串中至少有两个字符。您的regex需要输入3个字符,因为您使用了+。它也不允许一些Unicode空格。
-
You may use a combination of grouping with optional quantifiers (those allowing 0 length matches). See
^\S(?:\s*\S+)*$
(where\s
is replaced with\S
at the beginning matches a non-whitespace char and then a non-capturing group follows, that is*
quantified (matches zero or more occurrences) and matches 0+ sequences of 0+ whitespaces followed with 1+ non-whitespace characters. This is a good expression for those flavors like RE2 that do not support lookarounds, but support quantified groups.您可以使用分组与可选量词(允许0长度匹配的量词)的组合。看到^ \ S(?:\ S * \ S +)* $(\ S所取代,因为它是一个多行演示)。开始的\S匹配一个非空白字符,然后是一个非捕获组,它是* quantified(匹配零个或多个出现),并匹配0+ 0+空格序列,后面是1+非空白字符。对于RE2这样的口味来说,这是一个很好的表达方式,它不支持查找,而是支持量化组。
-
You may use lookaheads to require the first and last character to be non-whitespace characters:
^(?=[\S\s]*\S$)\S[\S\s]*$
where(?=[\s\S]*\S$)
requires the last char to be a non-whitespace and the\S
after the lookahead will require the first char to be non-whitespace.[\s\S]*
matches 0+ any characters. This will match 1 char strings, but won't match empty strings.你可以用超前要求第一个和最后一个字符为非空字符:^(? =[\ S \ S]* \新元)\ S[\ S \ S]*美元,(? =[\ S \ S]* \新元)需要一个非空和最后一个字符\ S超前后需要第一个非空字符。[\s]*匹配0+任何字符。这将匹配一个字符字符串,但不匹配空字符串。
-
If your regex to match strings with no leading/trailing whitespaces should also match an empty string, use 2 negative lookaheads:
^(?!\s)(?![\S\s]*\s$)[\S\s]*$
. The(?!\s)
lookahead will fail the match if there is a leading whitespace,(?![\S\s]*\s$)
will do the same in case of trailing whitespace, and[\s\S]*
will match 0+ any characters. *If lookarounds are not supported, use^(?:\S(?: *\S+)*)?$
that is much less efficient.如果你的正则表达式匹配的字符串没有领先/落后于空白也应该匹配一个空字符串,使用2 -超前:^(? ! \ s)(? ![\ s \ s]* \新元)[\ s \ s]*美元。如果有一个领先的空格,(?! ![\ s\ s]*\s] s$)在拖尾空格时也会这样做,而[\s\ s\ s]*将匹配0+任何字符。*如果不支持,看看使用^(?:\ S(?):* \ S +)*)?那就没那么有效了。
If you do not need to match any chars between the non-whitespace chars, you may revert [\s\S]
to your [\S ]
. In PCRE, a horizontal whitespace can be matched with \h
, in .NET and others that support Unicode properties, you can use [\t\p{Zs}]
to match any horizontal whitespace. In JS, [^\S\r\n\f\v\u2028\u2029]
can be used for that purpose.
如果您不需要在非空白字符之间匹配任何字符,您可以将[\s\ s]恢复到您的[\s]。在PCRE中,水平空白可以与\h进行匹配,在。net中以及其他支持Unicode属性的地方,您可以使用[\t\p{Zs}]来匹配任何水平空白。在JS,[^ \ S \ r \ n \ f \ v \ u2028 \ u2029)可用于这一目的。
Note that some regex flavors do not support non-capturing groups, you may replace all (?:
with (
in the above patterns.
注意,有些regex风味不支持非捕获组,您可以替换所有(?在上面的模式中。
#1
2
To match a string with no leading and trailing spaces in the JavaScript regex flavor, you can use several options:
要匹配JavaScript regex风格中没有前导和尾随空格的字符串,可以使用以下几个选项:
-
Require the first and the last non-whitespace character with
\S
(=[^\s]
). This can be done with, say,^\S[\S\s]*\S$
. This regex requires at least 2 characters to be in the string. Your regex requires 3 chars in the input since you used+
. It won't allow some Unicode whitespaces either.需要第一个和最后一个非空字符\ S(= ^ \[S])。说,这可以用^ \[\ S \ S]* \年代美元。这个regex要求字符串中至少有两个字符。您的regex需要输入3个字符,因为您使用了+。它也不允许一些Unicode空格。
-
You may use a combination of grouping with optional quantifiers (those allowing 0 length matches). See
^\S(?:\s*\S+)*$
(where\s
is replaced with\S
at the beginning matches a non-whitespace char and then a non-capturing group follows, that is*
quantified (matches zero or more occurrences) and matches 0+ sequences of 0+ whitespaces followed with 1+ non-whitespace characters. This is a good expression for those flavors like RE2 that do not support lookarounds, but support quantified groups.您可以使用分组与可选量词(允许0长度匹配的量词)的组合。看到^ \ S(?:\ S * \ S +)* $(\ S所取代,因为它是一个多行演示)。开始的\S匹配一个非空白字符,然后是一个非捕获组,它是* quantified(匹配零个或多个出现),并匹配0+ 0+空格序列,后面是1+非空白字符。对于RE2这样的口味来说,这是一个很好的表达方式,它不支持查找,而是支持量化组。
-
You may use lookaheads to require the first and last character to be non-whitespace characters:
^(?=[\S\s]*\S$)\S[\S\s]*$
where(?=[\s\S]*\S$)
requires the last char to be a non-whitespace and the\S
after the lookahead will require the first char to be non-whitespace.[\s\S]*
matches 0+ any characters. This will match 1 char strings, but won't match empty strings.你可以用超前要求第一个和最后一个字符为非空字符:^(? =[\ S \ S]* \新元)\ S[\ S \ S]*美元,(? =[\ S \ S]* \新元)需要一个非空和最后一个字符\ S超前后需要第一个非空字符。[\s]*匹配0+任何字符。这将匹配一个字符字符串,但不匹配空字符串。
-
If your regex to match strings with no leading/trailing whitespaces should also match an empty string, use 2 negative lookaheads:
^(?!\s)(?![\S\s]*\s$)[\S\s]*$
. The(?!\s)
lookahead will fail the match if there is a leading whitespace,(?![\S\s]*\s$)
will do the same in case of trailing whitespace, and[\s\S]*
will match 0+ any characters. *If lookarounds are not supported, use^(?:\S(?: *\S+)*)?$
that is much less efficient.如果你的正则表达式匹配的字符串没有领先/落后于空白也应该匹配一个空字符串,使用2 -超前:^(? ! \ s)(? ![\ s \ s]* \新元)[\ s \ s]*美元。如果有一个领先的空格,(?! ![\ s\ s]*\s] s$)在拖尾空格时也会这样做,而[\s\ s\ s]*将匹配0+任何字符。*如果不支持,看看使用^(?:\ S(?):* \ S +)*)?那就没那么有效了。
If you do not need to match any chars between the non-whitespace chars, you may revert [\s\S]
to your [\S ]
. In PCRE, a horizontal whitespace can be matched with \h
, in .NET and others that support Unicode properties, you can use [\t\p{Zs}]
to match any horizontal whitespace. In JS, [^\S\r\n\f\v\u2028\u2029]
can be used for that purpose.
如果您不需要在非空白字符之间匹配任何字符,您可以将[\s\ s]恢复到您的[\s]。在PCRE中,水平空白可以与\h进行匹配,在。net中以及其他支持Unicode属性的地方,您可以使用[\t\p{Zs}]来匹配任何水平空白。在JS,[^ \ S \ r \ n \ f \ v \ u2028 \ u2029)可用于这一目的。
Note that some regex flavors do not support non-capturing groups, you may replace all (?:
with (
in the above patterns.
注意,有些regex风味不支持非捕获组,您可以替换所有(?在上面的模式中。