How to get the string between two points using regex or any other library in Python 3?
如何使用正则表达式或Python 3中的任何其他库获取两点之间的字符串?
For eg: Blah blah ABC the string to be retrieved XYZ Blah Blah
例如:Blah blah ABC要检索的字符串XYZ Blah Blah
ABC and XYZ are variables which denote the start and end of the string which I have to retrieve.
ABC和XYZ是变量,表示我必须检索的字符串的开头和结尾。
2 个解决方案
#1
5
Use ABC
and XYZ
as anchors with look-behind and look-ahead assertions:
使用ABC和XYZ作为具有后视和前瞻断言的锚点:
(?<=ABC).*?(?=XYZ)
The (?<=...)
look-behind assertion only matches at the location in the text that was preceded by ABC
. Similarly, (?=XYZ)
matches at the location that is followed by XYZ
. Together they form two anchors that limit the .*
expression, which matches anything.
(?<= ...)后视断言仅匹配文本中以ABC开头的位置。类似地,(?= XYZ)匹配XYZ后面的位置。它们一起形成两个限制。*表达式的锚点,它匹配任何东西。
You can find all such anchored pieces of text with re.findall()
:
您可以使用re.findall()找到所有这些锚定的文本片段:
for matchedtext in re.findall(r'(?<=ABC).*?(?=XYZ)', inputtext):
If ABC
and XYZ
are variable, you want to use re.escape()
(to prevent any of their content from being interpreted as regular expression syntax) on them and interpolate:
如果ABC和XYZ是可变的,你想在它们上使用re.escape()(以防止它们的任何内容被解释为正则表达式语法)并插入:
re.match(r'(?<={}).*?(?={})'.format(abc, xyz), inputtext)
#2
2
I think this is what you want:
我想这就是你想要的:
import re
match = re.search('ABC(.*)XYZ','Blah blah ABC the string to be retrieved XYZ Blah Blah')
print match.group(1)
#1
5
Use ABC
and XYZ
as anchors with look-behind and look-ahead assertions:
使用ABC和XYZ作为具有后视和前瞻断言的锚点:
(?<=ABC).*?(?=XYZ)
The (?<=...)
look-behind assertion only matches at the location in the text that was preceded by ABC
. Similarly, (?=XYZ)
matches at the location that is followed by XYZ
. Together they form two anchors that limit the .*
expression, which matches anything.
(?<= ...)后视断言仅匹配文本中以ABC开头的位置。类似地,(?= XYZ)匹配XYZ后面的位置。它们一起形成两个限制。*表达式的锚点,它匹配任何东西。
You can find all such anchored pieces of text with re.findall()
:
您可以使用re.findall()找到所有这些锚定的文本片段:
for matchedtext in re.findall(r'(?<=ABC).*?(?=XYZ)', inputtext):
If ABC
and XYZ
are variable, you want to use re.escape()
(to prevent any of their content from being interpreted as regular expression syntax) on them and interpolate:
如果ABC和XYZ是可变的,你想在它们上使用re.escape()(以防止它们的任何内容被解释为正则表达式语法)并插入:
re.match(r'(?<={}).*?(?={})'.format(abc, xyz), inputtext)
#2
2
I think this is what you want:
我想这就是你想要的:
import re
match = re.search('ABC(.*)XYZ','Blah blah ABC the string to be retrieved XYZ Blah Blah')
print match.group(1)