使用python regex从字符串中查找子字符串

时间:2022-09-07 00:26:52

I have a string:

我有一个字符串:

 <robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">

I want to get the value of "generated", but with below code, it doesn't work

我想要得到“生成”的值,但是在下面的代码中,它不起作用

import re
doc=r'<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'
match = re.match(r'generated="(\d+ \d+:\d+:\d+.\d+)',doc)

the value of match is none. can anyone help?

匹配的值为none。谁能帮忙吗?

2 个解决方案

#1


1  

re.match matches only at the beginning of the string. Use re.search instead which matches not only at the beginning, but matches anywhere.

re.match只匹配字符串的开头。使用re.search代替它,它不仅在开始时匹配,而且在任何地方匹配。

>>> import re
>>> doc=r'<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'
>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc)
<_sre.SRE_Match object at 0x1010505d0>

>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc).group()
'generated="20170330 17:19:11.956'

>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc).group(1)
'20170330 17:19:11.956'

See search() vs. match() from re module documentation

参见re模块文档中的search()和match()

#2


1  

You don't necessarily need regular expressions in this case. Here is an alternative idea that uses BeautifulSoup XML/HTML parser with dateutil datetime parser:

在这种情况下,您不需要正则表达式。使用漂亮的XML/HTML解析器和dateutil datetime解析器的另一个想法是:

In [1]: from dateutil.parser import parse

In [2]: from bs4 import BeautifulSoup

In [3]: data = '<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'

In [4]: parse(BeautifulSoup(data, "html.parser").robot['generated'])
Out[4]: datetime.datetime(2017, 3, 30, 17, 19, 11, 956000)

I find this approach beautiful, easy and straightforward.

我发现这种方法很漂亮,简单,也很简单。

#1


1  

re.match matches only at the beginning of the string. Use re.search instead which matches not only at the beginning, but matches anywhere.

re.match只匹配字符串的开头。使用re.search代替它,它不仅在开始时匹配,而且在任何地方匹配。

>>> import re
>>> doc=r'<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'
>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc)
<_sre.SRE_Match object at 0x1010505d0>

>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc).group()
'generated="20170330 17:19:11.956'

>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc).group(1)
'20170330 17:19:11.956'

See search() vs. match() from re module documentation

参见re模块文档中的search()和match()

#2


1  

You don't necessarily need regular expressions in this case. Here is an alternative idea that uses BeautifulSoup XML/HTML parser with dateutil datetime parser:

在这种情况下,您不需要正则表达式。使用漂亮的XML/HTML解析器和dateutil datetime解析器的另一个想法是:

In [1]: from dateutil.parser import parse

In [2]: from bs4 import BeautifulSoup

In [3]: data = '<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'

In [4]: parse(BeautifulSoup(data, "html.parser").robot['generated'])
Out[4]: datetime.datetime(2017, 3, 30, 17, 19, 11, 956000)

I find this approach beautiful, easy and straightforward.

我发现这种方法很漂亮,简单,也很简单。