如何在python中返回regex匹配的字符串?

时间:2023-01-14 14:14:40

I am running through lines in a text file using a python script. I want to search for an img tag within the text document and return the tag as text.

我使用python脚本在文本文件中遍历行。我想在文本文档中搜索img标签,并将标签作为文本返回。

When I run the regex re.match(line) it returns a _sre.SRE_MATCH object. How do I get it to return a string?

当我运行regex res .match(行)时,它将返回一个_sre。SRE_MATCH对象。如何让它返回一个字符串?

import sys
import string
import re

f = open("sample.txt", 'r' )
l = open('writetest.txt', 'w')

count = 1

for line in f:
    line = line.rstrip()
    imgtag  = re.match(r'<img.*?>',line)
    print("yo it's a {}".format(imgtag))

When run it prints:

运行时它打印:

yo it's a None
yo it's a None
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0>
yo it's a None
yo it's a None

4 个解决方案

#1


46  

You should use re.MatchObject.group(0). Like

您应该使用re.MatchObject.group(0)。就像

imtag = re.match(r'<img.*?>', line).group(0)

Edit:

编辑:

You also might be better off doing something like

你也可以做一些类似的事情

imgtag  = re.match(r'<img.*?>',line)
if imtag:
    print("yo it's a {}".format(imgtag.group(0)))

to eliminate all the Nones.

消除所有的0。

#2


6  

Considering there might be several img tags I would recommend re.findall:

考虑到可能会有几个img标签,我推荐re.findall:

import re

with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
    for line in f_in:
        for img in re.findall('<img[^>]+>', line):
            print >> f_out, "yo it's a {}".format(img)

#3


3  

imgtag.group(0) or imgtag.group(). This returns the entire match as a string. You are not capturing anything else either.

imgtag.group(0)或imgtag.group()。这将把整个匹配作为字符串返回。你也没有捕捉其他任何东西。

http://docs.python.org/release/2.5.2/lib/match-objects.html

http://docs.python.org/release/2.5.2/lib/match-objects.html

#4


1  

Note that re.match(pattern, string, flags=0) only returns matches at the beginning of the string. If you want to locate a match anywhere in the string, use re.search(pattern, string, flags=0) instead (https://docs.python.org/3/library/re.html). This will scan the string and return the first match object. Then you can extract the matching string with match_object.group(0) as the folks suggested.

注意,re.match(pattern, string, flags=0)只返回字符串开头的匹配。如果希望在字符串中的任何位置找到匹配,请使用re.search(模式、字符串、标志=0)代替(https://docs.python.org/3/library/re.html)。这将扫描字符串并返回第一个匹配对象。然后您可以使用match_object.group(0)提取匹配的字符串。

#1


46  

You should use re.MatchObject.group(0). Like

您应该使用re.MatchObject.group(0)。就像

imtag = re.match(r'<img.*?>', line).group(0)

Edit:

编辑:

You also might be better off doing something like

你也可以做一些类似的事情

imgtag  = re.match(r'<img.*?>',line)
if imtag:
    print("yo it's a {}".format(imgtag.group(0)))

to eliminate all the Nones.

消除所有的0。

#2


6  

Considering there might be several img tags I would recommend re.findall:

考虑到可能会有几个img标签,我推荐re.findall:

import re

with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
    for line in f_in:
        for img in re.findall('<img[^>]+>', line):
            print >> f_out, "yo it's a {}".format(img)

#3


3  

imgtag.group(0) or imgtag.group(). This returns the entire match as a string. You are not capturing anything else either.

imgtag.group(0)或imgtag.group()。这将把整个匹配作为字符串返回。你也没有捕捉其他任何东西。

http://docs.python.org/release/2.5.2/lib/match-objects.html

http://docs.python.org/release/2.5.2/lib/match-objects.html

#4


1  

Note that re.match(pattern, string, flags=0) only returns matches at the beginning of the string. If you want to locate a match anywhere in the string, use re.search(pattern, string, flags=0) instead (https://docs.python.org/3/library/re.html). This will scan the string and return the first match object. Then you can extract the matching string with match_object.group(0) as the folks suggested.

注意,re.match(pattern, string, flags=0)只返回字符串开头的匹配。如果希望在字符串中的任何位置找到匹配,请使用re.search(模式、字符串、标志=0)代替(https://docs.python.org/3/library/re.html)。这将扫描字符串并返回第一个匹配对象。然后您可以使用match_object.group(0)提取匹配的字符串。