Python 3如何使用正则表达式在两点之间获取字符串?

时间:2022-11-27 19:14:10

How to get the string between two points using regex or any other library in Python 3?

如何使用正则表达式或Python 3中的任何其他库获取两点之间的字符串?

For eg: Blah blah ABC the string to be retrieved XYZ Blah Blah

例如:Blah blah ABC要检索的字符串XYZ Blah Blah

ABC and XYZ are variables which denote the start and end of the string which I have to retrieve.

ABC和XYZ是变量,表示我必须检索的字符串的开头和结尾。

2 个解决方案

#1


5  

Use ABC and XYZ as anchors with look-behind and look-ahead assertions:

使用ABC和XYZ作为具有后视和前瞻断言的锚点:

(?<=ABC).*?(?=XYZ)

The (?<=...) look-behind assertion only matches at the location in the text that was preceded by ABC. Similarly, (?=XYZ) matches at the location that is followed by XYZ. Together they form two anchors that limit the .* expression, which matches anything.

(?<= ...)后视断言仅匹配文本中以ABC开头的位置。类似地,(?= XYZ)匹配XYZ后面的位置。它们一起形成两个限制。*表达式的锚点,它匹配任何东西。

You can find all such anchored pieces of text with re.findall():

您可以使用re.findall()找到所有这些锚定的文本片段:

for matchedtext in re.findall(r'(?<=ABC).*?(?=XYZ)', inputtext):

If ABC and XYZ are variable, you want to use re.escape() (to prevent any of their content from being interpreted as regular expression syntax) on them and interpolate:

如果ABC和XYZ是可变的,你想在它们上使用re.escape()(以防止它们的任何内容被解释为正则表达式语法)并插入:

re.match(r'(?<={}).*?(?={})'.format(abc, xyz), inputtext)

#2


2  

I think this is what you want:

我想这就是你想要的:

import re
match = re.search('ABC(.*)XYZ','Blah blah ABC the string to be retrieved XYZ Blah Blah')
print match.group(1)

#1


5  

Use ABC and XYZ as anchors with look-behind and look-ahead assertions:

使用ABC和XYZ作为具有后视和前瞻断言的锚点:

(?<=ABC).*?(?=XYZ)

The (?<=...) look-behind assertion only matches at the location in the text that was preceded by ABC. Similarly, (?=XYZ) matches at the location that is followed by XYZ. Together they form two anchors that limit the .* expression, which matches anything.

(?<= ...)后视断言仅匹配文本中以ABC开头的位置。类似地,(?= XYZ)匹配XYZ后面的位置。它们一起形成两个限制。*表达式的锚点,它匹配任何东西。

You can find all such anchored pieces of text with re.findall():

您可以使用re.findall()找到所有这些锚定的文本片段:

for matchedtext in re.findall(r'(?<=ABC).*?(?=XYZ)', inputtext):

If ABC and XYZ are variable, you want to use re.escape() (to prevent any of their content from being interpreted as regular expression syntax) on them and interpolate:

如果ABC和XYZ是可变的,你想在它们上使用re.escape()(以防止它们的任何内容被解释为正则表达式语法)并插入:

re.match(r'(?<={}).*?(?={})'.format(abc, xyz), inputtext)

#2


2  

I think this is what you want:

我想这就是你想要的:

import re
match = re.search('ABC(.*)XYZ','Blah blah ABC the string to be retrieved XYZ Blah Blah')
print match.group(1)