I would like to extract a number from a large html file with python. My idea was to use regex like this:
我想用python从一个大的html文件中提取一个数字。我的想法是使用这样的正则表达式:
import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
found = ''
found
But unfortunately i'm not used to regex and i fail to adapt this example to extract 0,54125
from:
但不幸的是,我不习惯正则表达式,我不能适应这个例子从0提取0,54125:
(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)
Is there an other way to extract the number or could some one help me with the regex?
有没有其他方法来提取数字或者有人可以帮助我使用正则表达式?
2 个解决方案
#1
0
If you want output 0,54125
(or \d+,\d+
), then you need to set some conditions for the output.
如果需要输出0,54125(或\ d +,\ d +),则需要为输出设置一些条件。
From the following input,
从以下输入,
(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)
If you want to extract 0,54125
, it seems you can try several regexs like follows,
如果你想提取0,54125,你似乎可以试试几个正则表达式如下,
(?<=\>)\d+,\d+
or,
(?<=\<div class=\"vk_ans vk_bk\"\>)\d+,\d+
, etc..
#2
0
You can replace some characters in your text before searching it. For example, to capture numbers like 12,34
you can do this:
在搜索之前,您可以替换文本中的某些字符。例如,要捕获12,34之类的数字,您可以执行以下操作:
text = 'gfgfdAAA12,34ZZZuijjk'
try:
text = text.replace(',', '')
found = re.search('AAA(\d+)ZZZ', text).group(1)
except AttributeError:
found = ''
print found
# 1234
If you need to capture the digits inside a line, you can make your pattern more general, like this:
如果你需要捕获一行内的数字,你可以使你的模式更通用,如下所示:
text = '<div class="vk_ans vk_bk">0,54125 count id</div>'
text = text.replace(',', '')
found = re.search('(\d+)', text).group(1)
print found
# 054125
#1
0
If you want output 0,54125
(or \d+,\d+
), then you need to set some conditions for the output.
如果需要输出0,54125(或\ d +,\ d +),则需要为输出设置一些条件。
From the following input,
从以下输入,
(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)
If you want to extract 0,54125
, it seems you can try several regexs like follows,
如果你想提取0,54125,你似乎可以试试几个正则表达式如下,
(?<=\>)\d+,\d+
or,
(?<=\<div class=\"vk_ans vk_bk\"\>)\d+,\d+
, etc..
#2
0
You can replace some characters in your text before searching it. For example, to capture numbers like 12,34
you can do this:
在搜索之前,您可以替换文本中的某些字符。例如,要捕获12,34之类的数字,您可以执行以下操作:
text = 'gfgfdAAA12,34ZZZuijjk'
try:
text = text.replace(',', '')
found = re.search('AAA(\d+)ZZZ', text).group(1)
except AttributeError:
found = ''
print found
# 1234
If you need to capture the digits inside a line, you can make your pattern more general, like this:
如果你需要捕获一行内的数字,你可以使你的模式更通用,如下所示:
text = '<div class="vk_ans vk_bk">0,54125 count id</div>'
text = text.replace(',', '')
found = re.search('(\d+)', text).group(1)
print found
# 054125