I am running through lines in a text file using a python
script. I want to search for an img
tag within the text document and return the tag as text.
我使用python脚本在文本文件中遍历行。我想在文本文档中搜索img标签,并将标签作为文本返回。
When I run the regex re.match(line)
it returns a _sre.SRE_MATCH
object. How do I get it to return a string?
当我运行regex res .match(行)时,它将返回一个_sre。SRE_MATCH对象。如何让它返回一个字符串?
import sys
import string
import re
f = open("sample.txt", 'r' )
l = open('writetest.txt', 'w')
count = 1
for line in f:
line = line.rstrip()
imgtag = re.match(r'<img.*?>',line)
print("yo it's a {}".format(imgtag))
When run it prints:
运行时它打印:
yo it's a None
yo it's a None
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0>
yo it's a None
yo it's a None
4 个解决方案
#1
46
You should use re.MatchObject.group(0)
. Like
您应该使用re.MatchObject.group(0)。就像
imtag = re.match(r'<img.*?>', line).group(0)
Edit:
编辑:
You also might be better off doing something like
你也可以做一些类似的事情
imgtag = re.match(r'<img.*?>',line)
if imtag:
print("yo it's a {}".format(imgtag.group(0)))
to eliminate all the None
s.
消除所有的0。
#2
6
Considering there might be several img
tags I would recommend re.findall
:
考虑到可能会有几个img标签,我推荐re.findall:
import re
with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
for line in f_in:
for img in re.findall('<img[^>]+>', line):
print >> f_out, "yo it's a {}".format(img)
#3
3
imgtag.group(0)
or imgtag.group()
. This returns the entire match as a string. You are not capturing anything else either.
imgtag.group(0)或imgtag.group()。这将把整个匹配作为字符串返回。你也没有捕捉其他任何东西。
http://docs.python.org/release/2.5.2/lib/match-objects.html
http://docs.python.org/release/2.5.2/lib/match-objects.html
#4
1
Note that re.match(pattern, string, flags=0)
only returns matches at the beginning of the string. If you want to locate a match anywhere in the string, use re.search(pattern, string, flags=0)
instead (https://docs.python.org/3/library/re.html). This will scan the string and return the first match object. Then you can extract the matching string with match_object.group(0)
as the folks suggested.
注意,re.match(pattern, string, flags=0)只返回字符串开头的匹配。如果希望在字符串中的任何位置找到匹配,请使用re.search(模式、字符串、标志=0)代替(https://docs.python.org/3/library/re.html)。这将扫描字符串并返回第一个匹配对象。然后您可以使用match_object.group(0)提取匹配的字符串。
#1
46
You should use re.MatchObject.group(0)
. Like
您应该使用re.MatchObject.group(0)。就像
imtag = re.match(r'<img.*?>', line).group(0)
Edit:
编辑:
You also might be better off doing something like
你也可以做一些类似的事情
imgtag = re.match(r'<img.*?>',line)
if imtag:
print("yo it's a {}".format(imgtag.group(0)))
to eliminate all the None
s.
消除所有的0。
#2
6
Considering there might be several img
tags I would recommend re.findall
:
考虑到可能会有几个img标签,我推荐re.findall:
import re
with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
for line in f_in:
for img in re.findall('<img[^>]+>', line):
print >> f_out, "yo it's a {}".format(img)
#3
3
imgtag.group(0)
or imgtag.group()
. This returns the entire match as a string. You are not capturing anything else either.
imgtag.group(0)或imgtag.group()。这将把整个匹配作为字符串返回。你也没有捕捉其他任何东西。
http://docs.python.org/release/2.5.2/lib/match-objects.html
http://docs.python.org/release/2.5.2/lib/match-objects.html
#4
1
Note that re.match(pattern, string, flags=0)
only returns matches at the beginning of the string. If you want to locate a match anywhere in the string, use re.search(pattern, string, flags=0)
instead (https://docs.python.org/3/library/re.html). This will scan the string and return the first match object. Then you can extract the matching string with match_object.group(0)
as the folks suggested.
注意,re.match(pattern, string, flags=0)只返回字符串开头的匹配。如果希望在字符串中的任何位置找到匹配,请使用re.search(模式、字符串、标志=0)代替(https://docs.python.org/3/library/re.html)。这将扫描字符串并返回第一个匹配对象。然后您可以使用match_object.group(0)提取匹配的字符串。