Python正则表达式找到所有数字和点

时间:2022-05-02 22:32:43

I'm using re.findall() to extract some version numbers from an HTML file:

我正在使用re.findall()从HTML文件中提取一些版本号:

>>> import re
>>> text = "<table><td><a href=\"url\">Test0.2.1.zip</a></td><td>Test0.2.1</td></table> Test0.2.1"
>>> re.findall("Test([\.0-9]*)", text)
['0.2.1.', '0.2.1', '0.2.1']

but I would like to only get the ones that do not end in a dot. The filename might not always be .zip so I can't just stick .zip in the regex.

但我想只得到那些不以点结尾的那些。文件名可能并不总是.zip所以我不能只在正则表达式中粘贴.zip。

I wanna end up with:

我想结束:

['0.2.1', '0.2.1']

Can anyone suggest a better regex to use? :)

任何人都可以建议使用更好的正则表达式吗? :)

1 个解决方案

#1


12  

re.findall(r"Test([0-9.]*[0-9]+)", text)

or, a bit shorter:

或者,有点短:

re.findall(r"Test([\d.]*\d+)", text)

By the way - you must not escape the dot in a character class:

顺便说一句 - 你不能逃避角色类中的点:

[\.0-9]  // matches: 0 1 2 3 4 5 6 7 8 9 . \
[.0-9]   // matches: 0 1 2 3 4 5 6 7 8 9 .

#1


12  

re.findall(r"Test([0-9.]*[0-9]+)", text)

or, a bit shorter:

或者,有点短:

re.findall(r"Test([\d.]*\d+)", text)

By the way - you must not escape the dot in a character class:

顺便说一句 - 你不能逃避角色类中的点:

[\.0-9]  // matches: 0 1 2 3 4 5 6 7 8 9 . \
[.0-9]   // matches: 0 1 2 3 4 5 6 7 8 9 .