I thought that I dominated all the Unicode stuff in Python 2, but it seems that there's something I don't understand. I have this user input from HTML that goes to my python script:
我认为我主宰了Python 2中的所有Unicode内容,但似乎有些东西我不明白。我有来自HTML的用户输入到我的python脚本:
a = "m\xe9dico"
I want this to be médico
(that means doctor). So to convert that to unicode I'm doing:
我希望这是médico(这意味着医生)。所以要将其转换为unicode我正在做:
a.decode("utf-8")
Or:
要么:
unicode(a, "utf-8")
But this is throwing:
但这是投掷:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
How can achieve this?
怎么能实现这个?
2 个解决方案
#1
5
This is not utf-8:
这不是utf-8:
print txt.decode('iso8859-1')
Out[14]: médico
If you want utf-8 string, use:
如果你想要utf-8字符串,请使用:
txt.decode('iso8859-1').encode('utf-8')
Out[15]: 'm\xc3\xa9dico'
#2
2
You can prefix your string with a u
to mark it as a unicode literal:
您可以在字符串前加上u来将其标记为unicode文字:
>>> a = u'm\xe9dico'
>>> print a
médico
>>> type(a)
<type 'unicode'>
or, to convert an existing string:
或者,转换现有字符串:
>>> a = 'm\xe9dico'
>>> type(a)
<type 'str'>
>>> new_a = unicode(a,'iso-8859-1')
>>> print new_a
médico
>>> type(new_a)
<type 'unicode'>
>>> new_a == u'm\xe9dico'
True
Further reading: Python docs - Unicode HOWTO.
进一步阅读:Python文档 - Unicode HOWTO。
#1
5
This is not utf-8:
这不是utf-8:
print txt.decode('iso8859-1')
Out[14]: médico
If you want utf-8 string, use:
如果你想要utf-8字符串,请使用:
txt.decode('iso8859-1').encode('utf-8')
Out[15]: 'm\xc3\xa9dico'
#2
2
You can prefix your string with a u
to mark it as a unicode literal:
您可以在字符串前加上u来将其标记为unicode文字:
>>> a = u'm\xe9dico'
>>> print a
médico
>>> type(a)
<type 'unicode'>
or, to convert an existing string:
或者,转换现有字符串:
>>> a = 'm\xe9dico'
>>> type(a)
<type 'str'>
>>> new_a = unicode(a,'iso-8859-1')
>>> print new_a
médico
>>> type(new_a)
<type 'unicode'>
>>> new_a == u'm\xe9dico'
True
Further reading: Python docs - Unicode HOWTO.
进一步阅读:Python文档 - Unicode HOWTO。