在双引号之间提取字符串

I'm reading a response from a source which is an journal or an essay and I have the html response as a string like:

我正在阅读来自日记或文章的来源的回复,我将html响应作为字符串,如:

According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.

根据一些人的观点,梦想表达了“人格的深刻方面”(Foulkes 184),尽管其他人不同意。

My goal is just to extract all of the quotes out of the given string and save each of them into a list. My approach was:

我的目标是从给定字符串中提取所有引号,并将每个引号保存到列表中。我的方法是:

[match.start() for m in re.Matches(inputString, "\"([^\"]*)\""))]

Somehow it didn't work for me. Any helps on my regex here? Thanks a lot.

不知何故,它对我不起作用。我的正则表达式有什么帮助吗?非常感谢。

2 个解决方案

#1

Provided there are no nested quotes:

如果没有嵌套引号:

re.findall(r'"([^"]*)"', inputString)

Demo:

>>> import re
>>> inputString = 'According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.'
>>> re.findall(r'"([^"]*)"', inputString)
['profound aspects of personality']

#2

Use this one if your input can have something like this: some "text \" and text" more

如果您的输入可以包含以下内容,请使用此选项:某些“文本”和文本“更多

s = '''According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.'''
lst = re.findall(r'"(.*?)(?<!\\)"', s)
print lst

Using (?<!\\) negative lookbehind it is checking there is no \ before the "

使用(?

#1