Python：如何使用长正则表达式进行行继续？ [重复]

This question already has an answer here:

这个问题在这里已有答案:

Pythonic way to create a long multi-line string 18 answers

Pythonic方式创建一个长多行字符串18个答案

I have a long regex that I want to continue on to the next line, but everything I've tried gives me either an EOL or breaks the regex. I have already continued the line once within the parenthesis, and have read this, among other things, How can I do a line break (line continuation) in Python?

我有一个很长的正则表达式,我想继续下一行,但我尝试的一切给了我一个EOL或打破正则表达式。我已经在括号内继续了一行,并阅读了这个,除其他外,我如何在Python中进行换行(换行)?

Working, but still too long:

工作,但仍然太长:

REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')

Wrong:

REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)[A-Z0-9]+
            )\s+([a-zA-Z\d-]+)')

SyntaxError: EOL while scanning string literal


REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\
                )[A-Z0-9]+)\s+([a-zA-Z\d-]+)')

sre_constants.error: unbalanced parenthesis


REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+( \
            [0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')

regex no longer works


REGEX = (re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+(
            [0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)'))

SyntaxError: EOL while scanning string literal

I have been able to shorten my regex so that this is no longer an issue, but I'm now interested to know how I might do line continuation with a long regex?

我已经能够缩短我的正则表达式,所以这不再是一个问题,但我现在有兴趣知道如何使用长正则表达式进行行继续?

3 个解决方案

#1

If you use the re.VERBOSE flag, you can split your regular expression up as much as you like to make it more readable:

如果使用re.VERBOSE标志,则可以根据需要将正则表达式拆分为更易读:

pattern = r"""
    \d\s+
    \d+\s+
    ([A-Z0-9-]+)\s+
    ([0-9]+.\d\(\d\)[A-Z0-9]+)\s+
    ([a-zA-Z\d-]+)"""

REGEX = re.compile(pattern, re.VERBOSE)

This approach is explained in Dive Into Python - Verbose Regular Expressions.

这种方法在Dive Into Python - Verbose Regular Expressions中有解释。

#2

You can use multiple strings in multiple lines, and Python would concatenate them (as long as the multiple strings are between ( and )) before sending to re.compile. Example -

您可以在多行中使用多个字符串,Python会在发送到re.compile之前将它们连接起来(只要多个字符串在(和)之间)。示例 -

REGEX = re.compile(r"\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)"
                   r"[A-Z0-9]+)\s+([a-zA-Z\d-]+)")

#3

try:

regex = re.compile(
    r'\d\s+\d+\s+([A-Z0-9-]+)\s+('
    r'[0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)'
)

#1