问号和x在正则表达式组中的含义

I'm trying to learn Atom's syntax highlighting/grammar rules, which heavily use JS regular expressions, and came across an unfamiliar pattern in the python grammar file.

我正在尝试学习Atom的语法高亮/语法规则,这些规则大量使用JS正则表达式,并且在python语法文件中遇到了一个不熟悉的模式。

The pattern starts with a (?x) which is an unfamiliar regex to me. I looked it up in an online regex tester, which seems to say that it's invalid. My initial thought was it represents an optional left paren, but I believe the paren should be escaped here.

模式以(?x)开头,这对我来说是一个不熟悉的正则表达式。我在一个在线正则表达式测试器中查找了它,它似乎说它无效。我最初的想法是它代表了一个可选的左派,但我相信这些人应该逃到这里。

Does this only have meaning in the Atom's coffeescript grammar, or am I overlooking a regex meaning?

这只是Atom的coffeescript语法中的含义,还是我忽略了正则表达式的含义?

(This pattern also appear in the textmate language file that I believe Atom's came from).

(这种模式也出现在我相信Atom来自的textmate语言文件中)。

2 个解决方案

#1

If that regular expression gets processed in Python, it'll be compiled with the 'verbose' flag.

如果在Python中处理该正则表达式,它将使用'verbose'标志进行编译。

From the Python re docs:

从Python re docs:

(?aiLmsux)

(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.

(来自集合'a','i','L','m','s','u','x'的一个或多个字母。)该组匹配空字符串;字母设置相应的标志:re.A(仅限ASCII匹配),re.I(忽略大小写),re.L(依赖于语言环境),re.M(多行),re.S(点匹配全部)和re.X(详细),用于整个正则表达式。 (标志在模块内容中描述。)如果您希望将标志包含在正则表达式的一部分中,而不是将标志参数传递给re.compile()函数,这将非常有用。

#2

JavaScript regex engine does not support VERBOSE modifier x, neither inline, nor a regular one.

JavaScript正则表达式引擎不支持VERBOSE修饰符x,既不内联也不常规。

See Free-Spacing: x (except JavaScript) at rexegg.com:

请参阅rexegg.com上的Free-Spacing:x(JavaScript除外):

By default, any space in a regex string specifies a character to be matched. In languages where you can write regex strings on multiple lines, the line breaks also specify literal characters to be matched. Because you cannot insert spaces to separate groups that carry different meanings (as you do between phrases and pragraphs when you write in English), a regex can become hard to read...

默认情况下,正则表达式字符串中的任何空格都指定要匹配的字符。在可以在多行上编写正则表达式字符串的语言中,换行符还指定要匹配的文字字符。因为你不能将空格插入到具有不同含义的单独组中(就像你用英语写的那样在短语和pragraph之间),正则表达式可能变得难以阅读......

Luckily, many engines support a free-spacing mode that allows you to aerate your regex. For instance, you can add spaces between the tokens.

幸运的是,许多引擎支持*间距模式,允许您对正则表达式进行通气。例如,您可以在令牌之间添加空格。

You may also see it called whitespace mode, comment mode or verbose mode.

您可能还会看到它称为空白模式,评论模式或详细模式。

Here is how it can look like in Python:

以下是它在Python中的样子:

import re
regex = r"""(?x)
\d+                # Digits
\D+                # Non-digits up to...
$                  # The end of string
"""
print(re.search(regex, "My value: 56%").group(0)) # => 56%

#1