如何在python中拆分非常长的正则表达式

时间:2022-02-11 21:40:13

i have a regular expression which is very long.

我有一个非常长的正则表达式。

 vpa_pattern = '(VAP) ([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}): (.*)'

My code to match group as follows:

我的代码匹配组如下:

 class ReExpr:
def __init__(self):
    self.string=None

def search(self,regexp,string):
    self.string=string
    self.rematch = re.search(regexp, self.string)
    return bool(self.rematch)

def group(self,i):
    return self.rematch.group(i)

 m = ReExpr()

 if m.search(vpa_pattern,line):
    print m.group(1)
    print m.group(2)
    print m.group(3)

I tried to make the regular expression pattern to multiple line in following ways,

我尝试通过以下方式将正则表达式模式设置为多行,

vpa_pattern = '(VAP) \
    ([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}):\
    (.*)'

Or Even i tried:

或者甚至我试过:

 vpa_pattern = re.compile(('(VAP) \
    ([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}):\
    (.*)'))

But above methods are not working. For each group i have a space () after open and close parenthesis. I guess it is not picking up when i split to multiple lines.

但上述方法不起作用。对于每个组,我在打开和关闭括号后有一个空格()。当我分成多行时,我猜它不会捡起来。

3 个解决方案

#1


4  

Look at re.X flag. It allows comments and ignores white spaces in regex.

看看re.X标志。它允许注释并忽略正则表达式中的空格。

a = re.compile(r"""\d +  # the integral part
               \.    # the decimal point
               \d *  # some fractional digits""", re.X)

#2


3  

Python allows writing text strings in parts if enclosed in parenthesis:

如果括在括号中,Python允许在部分中编写文本字符串:

>>> text = ("alfa" "beta"
... "gama")
...
>>> text
'alfabetagama'

or in your code:

或者在你的代码中:

text = ("alfa" "beta"
        "gama" "delta"
        "omega")
print text

will print

将打印

"alfabetagamadeltaomega"

#3


1  

Its actually quite simple. You already use the {} notation. Use it again. So instead of:

它其实很简单。您已使用{}表示法。再次使用它。所以代替:

'([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}):'

which is just a repeat of [0-9A-Fa-f]{2}: 6 times, you can use:

这只是[0-9A-Fa-f] {2}的重复:6次,你可以使用:

'([0-9A-Fa-f]{2}:){6}'

We can even simplify it further by using \d to represent digits:

我们甚至可以通过使用\ d来表示数字来进一步简化它:

'([\dA-Fa-f]{2}:){6}'

NOTE: Depending on what re function you use, you can pass in re.IGNORE_CASE and simplify that chunk down to [\da-f]{2}:

注意:根据您使用的函数,您可以传入re.IGNORE_CASE并将该块简化为[\ da-f] {2}:

So your final regex is:

所以你的最终正则表达式是:

'(VAP) ([\dA-Fa-f]{2}:){6} (.*)'

#1


4  

Look at re.X flag. It allows comments and ignores white spaces in regex.

看看re.X标志。它允许注释并忽略正则表达式中的空格。

a = re.compile(r"""\d +  # the integral part
               \.    # the decimal point
               \d *  # some fractional digits""", re.X)

#2


3  

Python allows writing text strings in parts if enclosed in parenthesis:

如果括在括号中,Python允许在部分中编写文本字符串:

>>> text = ("alfa" "beta"
... "gama")
...
>>> text
'alfabetagama'

or in your code:

或者在你的代码中:

text = ("alfa" "beta"
        "gama" "delta"
        "omega")
print text

will print

将打印

"alfabetagamadeltaomega"

#3


1  

Its actually quite simple. You already use the {} notation. Use it again. So instead of:

它其实很简单。您已使用{}表示法。再次使用它。所以代替:

'([0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}:[0-9A-Fa-f]{2}):'

which is just a repeat of [0-9A-Fa-f]{2}: 6 times, you can use:

这只是[0-9A-Fa-f] {2}的重复:6次,你可以使用:

'([0-9A-Fa-f]{2}:){6}'

We can even simplify it further by using \d to represent digits:

我们甚至可以通过使用\ d来表示数字来进一步简化它:

'([\dA-Fa-f]{2}:){6}'

NOTE: Depending on what re function you use, you can pass in re.IGNORE_CASE and simplify that chunk down to [\da-f]{2}:

注意:根据您使用的函数,您可以传入re.IGNORE_CASE并将该块简化为[\ da-f] {2}:

So your final regex is:

所以你的最终正则表达式是:

'(VAP) ([\dA-Fa-f]{2}:){6} (.*)'