While there are several posts on * that are similar to this, none of them involve a situation when the target string is one space after one of the substrings.
虽然*上有几个与此类似的帖子,但是当目标字符串是其中一个子字符串之后的一个空格时,它们都不会涉及这种情况。
I have the following string (example_string): <insert_randomletters>[?] I want this string.Reduced<insert_randomletters>
我有以下字符串(example_string):
I want to extract "I want this string." from the string above. The randomletters will always change, however the quote "I want this string." will always be between [?]
(with a space after the last square bracket) and Reduced.
我想提取“我想要这个字符串”。从上面的字符串。随机数组将始终更改,但引用“我想要此字符串”。将始终位于[?](最后一个方括号后面的空格)和缩小之间。
Right now, I can do the following to extract "I want this string".
现在,我可以执行以下操作来提取“我想要这个字符串”。
target_quote_object = re.search('[?](.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text[2:])
This eliminates the ]
and that always appear at the start of my extracted string, thus only printing "I want this string." However, this solution seems ugly, and I'd rather make
re.search()
return the current target string without any modification. How can I do this?
这消除了]并且总是出现在我提取的字符串的开头,因此只打印“我想要这个字符串”。但是,这个解决方案看起来很难看,而且我宁愿让re.search()返回当前的目标字符串而不做任何修改。我怎样才能做到这一点?
5 个解决方案
#1
4
Your '[?](.*?)Reduced'
pattern matches a literal ?
, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced
substring. That [?]
is a character class formed with unescaped brackets, and the ?
inside a character class is a literal ?
char. That is why your Group 1 contains the ]
and a space.
你的'[?](。*?)Reduced'模式与文字?匹配,然后捕获除了换行符之外的任何0 +字符,尽可能少到第一个Reduced子字符串。那个[?]是一个用非转义括号组成的字符类,而?在一个字符类里面是一个文字?焦炭。这就是你的第1组包含]和空格的原因。
To make your regex match [?]
you need to escape [
and ?
and they will be matched as literal chars. Besides, you need to add a space after ]
to actually make sure it does not land into Group 1. A better idea is to use \s*
(0 or more whitespaces) or \s+
(1 or more occurrences).
要使你的正则表达式匹配[?]你需要逃避[和?它们将作为文字字符匹配。此外,你需要在]之后添加一个空格,以确保它不会进入第1组。更好的想法是使用\ s *(0或更多空格)或\ s +(1次或多次出现)。
Use
re.search(r'\[\?]\s*(.*?)Reduced', example_string)
See the regex demo.
请参阅正则表达式演示。
import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
print(m.group(1))
# => I want this string.
See the Python demo.
请参阅Python演示。
#2
1
The solution turned out to be:
解决方案原来是:
target_quote_object = re.search('] (.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text)
However, Wiktor's solution is better.
但是,Wiktor的解决方案更好。
#3
1
Regex may not be necessary for this, provided your string is in a consistent format:
如果您的字符串格式一致,则可能不需要正则表达式:
mystr = '<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'
res = mystr.split('Reduced')[0].split('] ')[1]
# 'I want this string.'
#4
1
You [co]/[sho]uld use Positive Lookbehind (?<=\[\?\])
:
你[co] / [笑]使用Positive Lookbehind(?<= \ [\?\]):
import re
pattern=r'(?<=\[\?\])(\s\w.+?)Reduced'
string_data='<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'
print(re.findall(pattern,string_data)[0].strip())
output:
I want this string.
#5
0
Like the other answer, this might not be necessary. Or just too long-winded for Python. This method uses one of the common string methods find
.
像其他答案一样,这可能没有必要。或者对Python来说太过啰嗦。此方法使用常见的字符串方法之一查找。
-
str.find(sub,start,end)
will return the index of the first occurrence ofsub
in the substringstr[start:end]
or returns -1 if none found. - In each iteration, the index of
[?]
is retrieved following with index ofReduced
. Resulting substring is printed. - Every time this
[?]...Reduced
pattern is returned, the index is updated to the rest of the string. The search is continued from that index.
str.find(sub,start,end)将返回子串str [start:end]中第一次出现的sub的索引,如果没有找到则返回-1。
在每次迭代中,使用Reduced索引检索[?]的索引。打印出结果子字符串。
每次返回[?] ...简化模式时,索引将更新为字符串的其余部分。从该索引继续搜索。
Code
s = ' [?] Nice to meet you.Reduced efweww [?] Who are you? Reduced<insert_randomletters>[?] I want this
string.Reduced<insert_randomletters>'
idx = s.find('[?]')
while idx is not -1:
start = idx
end = s.find('Reduced',idx)
print(s[start+3:end].strip())
idx = s.find('[?]',end)
Output
$ python splmat.py
Nice to meet you.
Who are you?
I want this string.
#1
4
Your '[?](.*?)Reduced'
pattern matches a literal ?
, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced
substring. That [?]
is a character class formed with unescaped brackets, and the ?
inside a character class is a literal ?
char. That is why your Group 1 contains the ]
and a space.
你的'[?](。*?)Reduced'模式与文字?匹配,然后捕获除了换行符之外的任何0 +字符,尽可能少到第一个Reduced子字符串。那个[?]是一个用非转义括号组成的字符类,而?在一个字符类里面是一个文字?焦炭。这就是你的第1组包含]和空格的原因。
To make your regex match [?]
you need to escape [
and ?
and they will be matched as literal chars. Besides, you need to add a space after ]
to actually make sure it does not land into Group 1. A better idea is to use \s*
(0 or more whitespaces) or \s+
(1 or more occurrences).
要使你的正则表达式匹配[?]你需要逃避[和?它们将作为文字字符匹配。此外,你需要在]之后添加一个空格,以确保它不会进入第1组。更好的想法是使用\ s *(0或更多空格)或\ s +(1次或多次出现)。
Use
re.search(r'\[\?]\s*(.*?)Reduced', example_string)
See the regex demo.
请参阅正则表达式演示。
import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
print(m.group(1))
# => I want this string.
See the Python demo.
请参阅Python演示。
#2
1
The solution turned out to be:
解决方案原来是:
target_quote_object = re.search('] (.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text)
However, Wiktor's solution is better.
但是,Wiktor的解决方案更好。
#3
1
Regex may not be necessary for this, provided your string is in a consistent format:
如果您的字符串格式一致,则可能不需要正则表达式:
mystr = '<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'
res = mystr.split('Reduced')[0].split('] ')[1]
# 'I want this string.'
#4
1
You [co]/[sho]uld use Positive Lookbehind (?<=\[\?\])
:
你[co] / [笑]使用Positive Lookbehind(?<= \ [\?\]):
import re
pattern=r'(?<=\[\?\])(\s\w.+?)Reduced'
string_data='<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'
print(re.findall(pattern,string_data)[0].strip())
output:
I want this string.
#5
0
Like the other answer, this might not be necessary. Or just too long-winded for Python. This method uses one of the common string methods find
.
像其他答案一样,这可能没有必要。或者对Python来说太过啰嗦。此方法使用常见的字符串方法之一查找。
-
str.find(sub,start,end)
will return the index of the first occurrence ofsub
in the substringstr[start:end]
or returns -1 if none found. - In each iteration, the index of
[?]
is retrieved following with index ofReduced
. Resulting substring is printed. - Every time this
[?]...Reduced
pattern is returned, the index is updated to the rest of the string. The search is continued from that index.
str.find(sub,start,end)将返回子串str [start:end]中第一次出现的sub的索引,如果没有找到则返回-1。
在每次迭代中,使用Reduced索引检索[?]的索引。打印出结果子字符串。
每次返回[?] ...简化模式时,索引将更新为字符串的其余部分。从该索引继续搜索。
Code
s = ' [?] Nice to meet you.Reduced efweww [?] Who are you? Reduced<insert_randomletters>[?] I want this
string.Reduced<insert_randomletters>'
idx = s.find('[?]')
while idx is not -1:
start = idx
end = s.find('Reduced',idx)
print(s[start+3:end].strip())
idx = s.find('[?]',end)
Output
$ python splmat.py
Nice to meet you.
Who are you?
I want this string.