I have read thru the other questions at *, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work.
我已经阅读了*的其他问题,但仍然没有。很抱歉,如果这一切都已经准备好了,但是我没有得到任何工作的建议。
>>> import re
>>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/xmas/xmas1.jpg')
>>> print m.groupdict()
{'tag': 'xmas', 'filename': 'xmas1.jpg'}
All is well, then I try something with Norwegian characters in it ( or something more unicode-like ):
一切都很好,然后我试着用挪威的文字(或者类似于unicode的东西):
>>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/påske/øyfjell.jpg')
>>> print m.groupdict()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groupdict'
How can I match typical unicode characters, like øæå? I`d like to be able to match those characters as well, in both the tag-group above and the one for filename.
我怎么能匹配典型的unicode字符,喜欢øæa吗?我希望能够匹配这些字符,在上面的标记组和文件名中。
3 个解决方案
#1
38
You need to specify the re.UNICODE
flag, and input your string as a Unicode string by using the u
prefix:
您需要指定re.UNICODE标志,并使用u前缀输入您的字符串作为Unicode字符串:
>>> re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', u'/by_tag/påske/øyfjell.jpg', re.UNICODE).groupdict()
{'tag': u'p\xe5ske', 'filename': u'\xf8yfjell.jpg'}
This is in Python 2; in Python 3 you must leave out the u
because all strings are Unicode.
这是在python2里;在Python 3中,必须省略u,因为所有字符串都是Unicode。
#2
9
You need the UNICODE flag:
您需要UNICODE标志:
m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/påske/øyfjell.jpg', re.UNICODE)
#3
5
In Python 2, you need the re.UNICODE flag and the unicode string constructor
在Python 2中,您需要re.UNICODE标志和unicode字符串构造函数。
>>> re.sub(r"[\w]+","___",unicode(",./hello-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./cześć-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./привет-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./你好-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./你好,世界-=+","utf-8"),flags=re.UNICODE)
u',./___\uff0c___-=+'
>>> print re.sub(r"[\w]+","___",unicode(",./你好,世界-=+","utf-8"),flags=re.UNICODE)
,./___,___-=+
(In the latter case, the comma is Chinese comma.)
(在后面的例子中,逗号是中文逗号。)
#1
38
You need to specify the re.UNICODE
flag, and input your string as a Unicode string by using the u
prefix:
您需要指定re.UNICODE标志,并使用u前缀输入您的字符串作为Unicode字符串:
>>> re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', u'/by_tag/påske/øyfjell.jpg', re.UNICODE).groupdict()
{'tag': u'p\xe5ske', 'filename': u'\xf8yfjell.jpg'}
This is in Python 2; in Python 3 you must leave out the u
because all strings are Unicode.
这是在python2里;在Python 3中,必须省略u,因为所有字符串都是Unicode。
#2
9
You need the UNICODE flag:
您需要UNICODE标志:
m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/påske/øyfjell.jpg', re.UNICODE)
#3
5
In Python 2, you need the re.UNICODE flag and the unicode string constructor
在Python 2中,您需要re.UNICODE标志和unicode字符串构造函数。
>>> re.sub(r"[\w]+","___",unicode(",./hello-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./cześć-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./привет-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./你好-=+","utf-8"),flags=re.UNICODE)
u',./___-=+'
>>> re.sub(r"[\w]+","___",unicode(",./你好,世界-=+","utf-8"),flags=re.UNICODE)
u',./___\uff0c___-=+'
>>> print re.sub(r"[\w]+","___",unicode(",./你好,世界-=+","utf-8"),flags=re.UNICODE)
,./___,___-=+
(In the latter case, the comma is Chinese comma.)
(在后面的例子中,逗号是中文逗号。)