I have many fill-in-the-blank sentences in strings,
我在字符串中有很多填空的句子,
e.g. "6d) We took no [pains] to hide it ."
例如“6d)我们没有[痛苦]隐藏它。”
How can I efficiently parse this string (in Python) to be
我怎样才能有效地解析这个字符串(在Python中)
"We took no to hide it"?
I also would like to be able to store the word in brackets (e.g. "pains") in a list for use later. I think the regex module could be better than Python string operations like split().
我还希望能够将这个词放在括号中(例如“痛苦”)列表*以后使用。我认为正则表达式模块可能比像split()这样的Python字符串操作更好。
5 个解决方案
#1
This will give you all the words inside the brackets.
这将为您提供括号内的所有单词。
import re
s="6d) We took no [pains] to hide it ."
matches = re.findall('\[(.*?)\]', s)
Then you can run this to remove all bracketed words.
然后你可以运行它来删除所有括号内的单词。
re.sub('\[(.*?)\]', '', s)
#2
just for fun (to do the gather and substitution in one iteration)
只是为了好玩(在一次迭代中进行收集和替换)
matches = []
def subber(m):
matches.append(m.groups()[0])
return ""
new_text = re.sub("\[(.*?)\]",subber,s)
print new_text
print matches
#3
import re
s = 'this is [test] string'
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)
Output
'test'
#4
For your example you could use this regex:
对于您的示例,您可以使用此正则表达式:
(.*\))(.+)\[(.+)\](.+)
You will get four groups that you can use to create your resulting string and save the 3. group for later use:
您将获得四个组,您可以使用它们来创建结果字符串并保存3.组以供以后使用:
6d)
We took no
pains
to hide it .
我们没有
隐藏它。
I used .+
here because I don't know if your strings always look like your example. You can change the .+
to alphanumeric or sth. more special to your case.
我用过。+因为我不知道你的字符串是否总是像你的例子。您可以将。+更改为字母或数字。你的情况更特别。
import re
s = '6d) We took no [pains] to hide it .'
m = re.search(r"(.*\))(.+)\[(.+)\](.+)", s)
print(m.group(2) + m.group(4)) # "We took no to hide it ."
print(m.group(3)) # pains
#5
import re
m = re.search(".*\) (.*)\[.*\] (.*)","6d) We took no [pains] to hide it .")
if m:
g = m.groups()
print g[0] + g[1]
Output :
We took no to hide it .
我们没有隐瞒它。
#1
This will give you all the words inside the brackets.
这将为您提供括号内的所有单词。
import re
s="6d) We took no [pains] to hide it ."
matches = re.findall('\[(.*?)\]', s)
Then you can run this to remove all bracketed words.
然后你可以运行它来删除所有括号内的单词。
re.sub('\[(.*?)\]', '', s)
#2
just for fun (to do the gather and substitution in one iteration)
只是为了好玩(在一次迭代中进行收集和替换)
matches = []
def subber(m):
matches.append(m.groups()[0])
return ""
new_text = re.sub("\[(.*?)\]",subber,s)
print new_text
print matches
#3
import re
s = 'this is [test] string'
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)
Output
'test'
#4
For your example you could use this regex:
对于您的示例,您可以使用此正则表达式:
(.*\))(.+)\[(.+)\](.+)
You will get four groups that you can use to create your resulting string and save the 3. group for later use:
您将获得四个组,您可以使用它们来创建结果字符串并保存3.组以供以后使用:
6d)
We took no
pains
to hide it .
我们没有
隐藏它。
I used .+
here because I don't know if your strings always look like your example. You can change the .+
to alphanumeric or sth. more special to your case.
我用过。+因为我不知道你的字符串是否总是像你的例子。您可以将。+更改为字母或数字。你的情况更特别。
import re
s = '6d) We took no [pains] to hide it .'
m = re.search(r"(.*\))(.+)\[(.+)\](.+)", s)
print(m.group(2) + m.group(4)) # "We took no to hide it ."
print(m.group(3)) # pains
#5
import re
m = re.search(".*\) (.*)\[.*\] (.*)","6d) We took no [pains] to hide it .")
if m:
g = m.groups()
print g[0] + g[1]
Output :
We took no to hide it .
我们没有隐瞒它。