I have the following string data:
我有以下字符串数据:
data = "*****''[[dogs and cats]]''/n"
I would like to use regular expressions in python to extract the string. All the data is encapsuled in the double quotes " ". What are the wildcards that I use so I can get the following:
我想在python中使用正则表达式来提取字符串。所有数据都用双引号“”封装。我使用的通配符是什么,所以我可以得到以下内容:
print data.groups(1)
print data.groups(2)
print data.groups(3)
'dogs'
'and'
'cats'
Edit: So far I have something a long the lines of this
编辑:到目前为止,我有一些很长的路要走
test = re.search("\\S*****''[[(.+) (.+) (.+)\\S]]''", "*****''[[dogs and cats]]''\n")
print test.group(1)
3 个解决方案
#1
1
Some people, when confronted with a problem, think, "I know, I'll use regular expressions." Now they have two problems." Jamie Zawinski
有些人在面对问题时会想:“我知道,我会使用正则表达式。”现在他们有两个问题。“Jamie Zawinski
data = "*****''[[dogs and cats]]''/n"
start = data.find('[')+2
end = data.find(']')
answer = data[start:end].split()
print answer[0]
print answer[1]
print answer[2]
#2
1
It's hard to know exactly what you're looking for, but I will assume you are looking for a regex that parses out one or more space-separated words surrounded by some non-alphanumeric characters.
很难确切知道你在寻找什么,但我会假设你正在寻找一个解析一个或多个空格分隔的单词的正则表达式,这些单词由一些非字母数字字符包围。
data = "*****''[[dogs and cats]]''/n"
# this pulls out the 'dogs and cats' substring
interior = re.match(r'\W*([\w ]*)\W*', data).group(1)
words = interior.split()
print words
# => ['dogs', 'and', 'cats']
This makes a lot of assumptions about your requirements though. Depending on exactly what you want, regular expressions may not be the best tool.
这会对您的要求做出很多假设。根据您的需要,正则表达式可能不是最佳工具。
#3
1
As others said, this is fairly simple using one extra split
step:
正如其他人所说,使用一个额外的拆分步骤相当简单:
data = "***rubbish**''[[dogs and cats]]''**more rubbish***"
words = re.findall('\[\[(.+?)\]\]', data)[0].split() # 'dogs', 'and', 'cats'
One single expression is also possible, but it looks rather confusing:
单个表达式也是可能的,但它看起来相当混乱:
rr = r'''
(?x)
(\w+)
(?=
(?:
(?!\[\[)
.
)*?
\]\]
)
'''
words = re.findall(rr, data) # 'dogs', 'and', 'cats'
#1
1
Some people, when confronted with a problem, think, "I know, I'll use regular expressions." Now they have two problems." Jamie Zawinski
有些人在面对问题时会想:“我知道,我会使用正则表达式。”现在他们有两个问题。“Jamie Zawinski
data = "*****''[[dogs and cats]]''/n"
start = data.find('[')+2
end = data.find(']')
answer = data[start:end].split()
print answer[0]
print answer[1]
print answer[2]
#2
1
It's hard to know exactly what you're looking for, but I will assume you are looking for a regex that parses out one or more space-separated words surrounded by some non-alphanumeric characters.
很难确切知道你在寻找什么,但我会假设你正在寻找一个解析一个或多个空格分隔的单词的正则表达式,这些单词由一些非字母数字字符包围。
data = "*****''[[dogs and cats]]''/n"
# this pulls out the 'dogs and cats' substring
interior = re.match(r'\W*([\w ]*)\W*', data).group(1)
words = interior.split()
print words
# => ['dogs', 'and', 'cats']
This makes a lot of assumptions about your requirements though. Depending on exactly what you want, regular expressions may not be the best tool.
这会对您的要求做出很多假设。根据您的需要,正则表达式可能不是最佳工具。
#3
1
As others said, this is fairly simple using one extra split
step:
正如其他人所说,使用一个额外的拆分步骤相当简单:
data = "***rubbish**''[[dogs and cats]]''**more rubbish***"
words = re.findall('\[\[(.+?)\]\]', data)[0].split() # 'dogs', 'and', 'cats'
One single expression is also possible, but it looks rather confusing:
单个表达式也是可能的,但它看起来相当混乱:
rr = r'''
(?x)
(\w+)
(?=
(?:
(?!\[\[)
.
)*?
\]\]
)
'''
words = re.findall(rr, data) # 'dogs', 'and', 'cats'