I have some strings that look like this:
我有一些看起来像这样的字符串:
<a href="javascript:updateParent('higashino/index.html')">東野 圭吾「夢幻花」「白夜行」</a>他<br>
Now I want to extract the link and the strings inside the corner brackets ("「" and "」"), like this:
现在我想提取角括号内的链接和字符串(“”“和”“”),如下所示:
['higashino/index.html', '夢幻花', '白夜行']
I've tried:
我试过了:
import re
str = u'''<a href="javascript:updateParent('higashino/index.html')">東野 圭 吾「夢幻花」「白夜行」</a>他<br>'''
myre = re.compile(ur'''\('(.*)'\)">.*「(.*?)」.*''', re.UNICODE)
myre.findall(str)
the result is:
结果是:
['higashino/index.html', '白夜行']
then I tried to use the pattern\('(.*)'\)">.*「([^」]*)」.*
, but the result was the same, only one element inside the corner brackets was found.
然后我尝试使用模式\('(。*)'\)“>。*”([^]] *)“。*,但结果是一样的,只找到了一个角括号内的元素。
How can I get not just one, but all elements inside the corner brackets? Thanks.
我怎么能得到一个,但角括号内的所有元素?谢谢。
2 个解决方案
#1
0
Use re.findall()
(or re.finditer
) with the regex 「([^」]*?)」
:
将re.findall()(或re.finditer)与regex「([^」] *?)一起使用:
import re
str = '''<a href="javascript:updateParent('higashino/index.html')">東野 圭 吾「夢幻花」「白夜行」</a>他<br>'''
match = re.findall(r'「([^」]*?)」', str)
print(match)
Giving:
赠送:
['夢幻花', '白夜行']
Using python 3. Also, if you're not using python 3 already I recommend doing so as it is much better with unicode strings than python 2
使用python 3.此外,如果你没有使用python 3,我建议这样做,因为使用unicode字符串比使用python 2要好得多
#2
0
>>> myre = re.compile(ur'''(?<=\(').+?(?='\)">)|(?<=「)[^」]+''', re.UNICODE)
>>> myre.findall(str)
[u'higashino/index.html', u'\u5922\u5e7b\u82b1', u'\u767d\u591c\u884c']
#1
0
Use re.findall()
(or re.finditer
) with the regex 「([^」]*?)」
:
将re.findall()(或re.finditer)与regex「([^」] *?)一起使用:
import re
str = '''<a href="javascript:updateParent('higashino/index.html')">東野 圭 吾「夢幻花」「白夜行」</a>他<br>'''
match = re.findall(r'「([^」]*?)」', str)
print(match)
Giving:
赠送:
['夢幻花', '白夜行']
Using python 3. Also, if you're not using python 3 already I recommend doing so as it is much better with unicode strings than python 2
使用python 3.此外,如果你没有使用python 3,我建议这样做,因为使用unicode字符串比使用python 2要好得多
#2
0
>>> myre = re.compile(ur'''(?<=\(').+?(?='\)">)|(?<=「)[^」]+''', re.UNICODE)
>>> myre.findall(str)
[u'higashino/index.html', u'\u5922\u5e7b\u82b1', u'\u767d\u591c\u884c']