使用正则表达式提取多行字符串的一部分

I am trying to extract the below line from a multi-line string:

我正在尝试从多行字符串中提取如下一行:

eth6.36   Link encap:Ethernet  HWaddr A0:36:9F:5F:24:EE  \r\n          inet addr:36.36.36.10  Bcast:36.36.36.255  Mask:255.255.255.0\r\n          inet6 addr: fe80::a236:9fff:fe5f:24ee/64

When I try to extract just eth6.36 Link encap, I get an error.

当我试图提取eth6.36 Link encap时，我得到了一个错误。

test = 'ifconfig eth6.36\r\neth6.36   Link encap:Ethernet  HWaddr A0:36:9F:5F:24:EE  \r\n          inet addr:36.36.36.10  Bcast:36.36.36.255  Mask:255.255.255.0\r\n          inet6 addr: fe80::a236:9fff:fe5f:24ee/64 Scope:Link\r\n          UP BROADCAST MULTICAST  MTU:9000  Metric:1\r\n          RX packets:0 errors:0 dropped:0 overruns:0 frame:0\r\n          TX packets:62 errors:0 dropped:0 overruns:0 carrier:0\r\n          collisions:0 txqueuelen:0 \r\n          RX bytes:0 (0.0 b)  TX bytes:7004 (6.8 KiB)\r\n\r\n'

match = re.match('(eth6.36\sLink encap:)', test)
print match.groups()
...
AttributeError: 'NoneType' object has no attribute 'groups'

Any ideas please?

有什么想法吗?

3 个解决方案

#1

You want this, There was a mistake in the formation of regex

你想要这个，regex的构造有一个错误

import re
test = 'ifconfig eth6.36\r\neth6.36   Link encap:Ethernet  HWaddr A0:36:9F:5F:24:EE  \r\n          inet addr:36.36.36.10  Bcast:36.36.36.255  Mask:255.255.255.0\r\n          inet6 addr: fe80::a236:9fff:fe5f:24ee/64 Scope:Link\r\n          UP BROADCAST MULTICAST  MTU:9000  Metric:1\r\n          RX packets:0 errors:0 dropped:0 overruns:0 frame:0\r\n          TX packets:62 errors:0 dropped:0 overruns:0 carrier:0\r\n          collisions:0 txqueuelen:0 \r\n          RX bytes:0 (0.0 b)  TX bytes:7004 (6.8 KiB)\r\n\r\n'

match = re.search('(eth6\.36\s*Link encap:)', test)
print match.groups()

Output

输出

('eth6.36   Link encap:',)

#2

re.match matches from the beginning of the string. Use re.search instead as it matches anywhere in the string:

从字符串的开始匹配。使用re.search代替，因为它匹配字符串中的任何地方:

>>> match = re.search('(eth6.36\s+Link encap:)', test)
>>> print match.groups()
('eth6.36   Link encap:',)

Also, you have to specify that multiple whitespace characters match: \s+ (note the +).

此外，还必须指定多个空格字符匹配:\s+(注意+)。

#3

Use findall with multiline instead. You also need a quantifier for \s.

使用带有多行代码的findall。你也需要一个量词为\s。

>>> re.findall(r'(eth6.36\s+Link encap:)',test, re.M)
['eth6.36   Link encap:']

If you're sure that only one result will come use search and remove the grouping parentheses:

如果您确定只有一个结果将使用搜索并删除分组括号:

>>> re.search(r'eth6.36\s+Link encap:',test).group()
'eth6.36   Link encap:'

#1