在第一个子字符串后面有空格时,在Python中的两个子字符串之间查找字符串

时间:2021-02-22 19:21:07

While there are several posts on * that are similar to this, none of them involve a situation when the target string is one space after one of the substrings.

虽然*上有几个与此类似的帖子,但是当目标字符串是其中一个子字符串之后的一个空格时,它们都不会涉及这种情况。

I have the following string (example_string): <insert_randomletters>[?] I want this string.Reduced<insert_randomletters>

我有以下字符串(example_string): [?]我想要这个字符串.Reduced <插入_randomletters>

I want to extract "I want this string." from the string above. The randomletters will always change, however the quote "I want this string." will always be between [?] (with a space after the last square bracket) and Reduced.

我想提取“我想要这个字符串”。从上面的字符串。随机数组将始终更改,但引用“我想要此字符串”。将始终位于[?](最后一个方括号后面的空格)和缩小之间。

Right now, I can do the following to extract "I want this string".

现在,我可以执行以下操作来提取“我想要这个字符串”。

target_quote_object = re.search('[?](.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text[2:])

This eliminates the ] and that always appear at the start of my extracted string, thus only printing "I want this string." However, this solution seems ugly, and I'd rather make re.search() return the current target string without any modification. How can I do this?

这消除了]并且总是出现在我提取的字符串的开头,因此只打印“我想要这个字符串”。但是,这个解决方案看起来很难看,而且我宁愿让re.search()返回当前的目标字符串而不做任何修改。我怎样才能做到这一点?

5 个解决方案

#1


4  

Your '[?](.*?)Reduced' pattern matches a literal ?, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced substring. That [?] is a character class formed with unescaped brackets, and the ? inside a character class is a literal ? char. That is why your Group 1 contains the ] and a space.

你的'[?](。*?)Reduced'模式与文字?匹配,然后捕获除了换行符之外的任何0 +字符,尽可能少到第一个Reduced子字符串。那个[?]是一个用非转义括号组成的字符类,而?在一个字符类里面是一个文字?焦炭。这就是你的第1组包含]和空格的原因。

To make your regex match [?] you need to escape [ and ? and they will be matched as literal chars. Besides, you need to add a space after ] to actually make sure it does not land into Group 1. A better idea is to use \s* (0 or more whitespaces) or \s+ (1 or more occurrences).

要使你的正则表达式匹配[?]你需要逃避[和?它们将作为文字字符匹配。此外,你需要在]之后添加一个空格,以确保它不会进入第1组。更好的想法是使用\ s *(0或更多空格)或\ s +(1次或多次出现)。

Use

re.search(r'\[\?]\s*(.*?)Reduced', example_string)

See the regex demo.

请参阅正则表达式演示。

import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
    print(m.group(1))
# => I want this string.

See the Python demo.

请参阅Python演示。

#2


1  

The solution turned out to be:

解决方案原来是:

target_quote_object = re.search('] (.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text)

However, Wiktor's solution is better.

但是,Wiktor的解决方案更好。

#3


1  

Regex may not be necessary for this, provided your string is in a consistent format:

如果您的字符串格式一致,则可能不需要正则表达式:

mystr = '<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

res = mystr.split('Reduced')[0].split('] ')[1]

# 'I want this string.'

#4


1  

You [co]/[sho]uld use Positive Lookbehind (?<=\[\?\]) :

你[co] / [笑]使用Positive Lookbehind(?<= \ [\?\]):

在第一个子字符串后面有空格时,在Python中的两个子字符串之间查找字符串

import re
pattern=r'(?<=\[\?\])(\s\w.+?)Reduced'

string_data='<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

print(re.findall(pattern,string_data)[0].strip())

output:

I want this string.

#5


0  

Like the other answer, this might not be necessary. Or just too long-winded for Python. This method uses one of the common string methods find.

像其他答案一样,这可能没有必要。或者对Python来说太过啰嗦。此方法使用常见的字符串方法之一查找。

  • str.find(sub,start,end) will return the index of the first occurrence of sub in the substring str[start:end] or returns -1 if none found.
  • str.find(sub,start,end)将返回子串str [start:end]中第一次出现的sub的索引,如果没有找到则返回-1。

  • In each iteration, the index of [?] is retrieved following with index of Reduced. Resulting substring is printed.
  • 在每次迭代中,使用Reduced索引检索[?]的索引。打印出结果子字符串。

  • Every time this [?]...Reduced pattern is returned, the index is updated to the rest of the string. The search is continued from that index.
  • 每次返回[?] ...简化模式时,索引将更新为字符串的其余部分。从该索引继续搜索。

Code

s = ' [?] Nice to meet you.Reduced  efweww  [?] Who are you? Reduced<insert_randomletters>[?] I want this 
string.Reduced<insert_randomletters>'


idx = s.find('[?]')
while idx is not -1:
    start = idx
    end = s.find('Reduced',idx)
    print(s[start+3:end].strip())
    idx = s.find('[?]',end)

Output

$ python splmat.py
Nice to meet you.
Who are you?
I want this string.

#1


4  

Your '[?](.*?)Reduced' pattern matches a literal ?, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced substring. That [?] is a character class formed with unescaped brackets, and the ? inside a character class is a literal ? char. That is why your Group 1 contains the ] and a space.

你的'[?](。*?)Reduced'模式与文字?匹配,然后捕获除了换行符之外的任何0 +字符,尽可能少到第一个Reduced子字符串。那个[?]是一个用非转义括号组成的字符类,而?在一个字符类里面是一个文字?焦炭。这就是你的第1组包含]和空格的原因。

To make your regex match [?] you need to escape [ and ? and they will be matched as literal chars. Besides, you need to add a space after ] to actually make sure it does not land into Group 1. A better idea is to use \s* (0 or more whitespaces) or \s+ (1 or more occurrences).

要使你的正则表达式匹配[?]你需要逃避[和?它们将作为文字字符匹配。此外,你需要在]之后添加一个空格,以确保它不会进入第1组。更好的想法是使用\ s *(0或更多空格)或\ s +(1次或多次出现)。

Use

re.search(r'\[\?]\s*(.*?)Reduced', example_string)

See the regex demo.

请参阅正则表达式演示。

import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
    print(m.group(1))
# => I want this string.

See the Python demo.

请参阅Python演示。

#2


1  

The solution turned out to be:

解决方案原来是:

target_quote_object = re.search('] (.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text)

However, Wiktor's solution is better.

但是,Wiktor的解决方案更好。

#3


1  

Regex may not be necessary for this, provided your string is in a consistent format:

如果您的字符串格式一致,则可能不需要正则表达式:

mystr = '<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

res = mystr.split('Reduced')[0].split('] ')[1]

# 'I want this string.'

#4


1  

You [co]/[sho]uld use Positive Lookbehind (?<=\[\?\]) :

你[co] / [笑]使用Positive Lookbehind(?<= \ [\?\]):

在第一个子字符串后面有空格时,在Python中的两个子字符串之间查找字符串

import re
pattern=r'(?<=\[\?\])(\s\w.+?)Reduced'

string_data='<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

print(re.findall(pattern,string_data)[0].strip())

output:

I want this string.

#5


0  

Like the other answer, this might not be necessary. Or just too long-winded for Python. This method uses one of the common string methods find.

像其他答案一样,这可能没有必要。或者对Python来说太过啰嗦。此方法使用常见的字符串方法之一查找。

  • str.find(sub,start,end) will return the index of the first occurrence of sub in the substring str[start:end] or returns -1 if none found.
  • str.find(sub,start,end)将返回子串str [start:end]中第一次出现的sub的索引,如果没有找到则返回-1。

  • In each iteration, the index of [?] is retrieved following with index of Reduced. Resulting substring is printed.
  • 在每次迭代中,使用Reduced索引检索[?]的索引。打印出结果子字符串。

  • Every time this [?]...Reduced pattern is returned, the index is updated to the rest of the string. The search is continued from that index.
  • 每次返回[?] ...简化模式时,索引将更新为字符串的其余部分。从该索引继续搜索。

Code

s = ' [?] Nice to meet you.Reduced  efweww  [?] Who are you? Reduced<insert_randomletters>[?] I want this 
string.Reduced<insert_randomletters>'


idx = s.find('[?]')
while idx is not -1:
    start = idx
    end = s.find('Reduced',idx)
    print(s[start+3:end].strip())
    idx = s.find('[?]',end)

Output

$ python splmat.py
Nice to meet you.
Who are you?
I want this string.