unicode().decode(“utf - 8”、“忽略”)提高UnicodeEncodeError

时间:2021-02-22 00:26:30

Here is the code:

这是代码:

>>> z = u'\u2022'.decode('utf-8', 'ignore')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2022' in position 0: ordinal not in range(256)

Why is UnicodeEncodeError raised when I am using .decode?

当我使用.decode时,为什么会出现UnicodeEncodeError ?

Why is any error raised when I am using 'ignore'?

当我使用“ignore”时,为什么会出现错误?

3 个解决方案

#1


59  

When I first started messing around with python strings and unicode, It took me awhile to understand the jargon of decode and encode too, so here's my post from here that may help:

当我开始摆弄python字符串和unicode时,我花了一段时间来理解解码和编码的术语,所以这是我在这里的文章可能会有所帮助:


Think of decoding as what you do to go from a regular bytestring to unicode and encoding as what you do to get back from unicode. In other words:

想想你所做的事情,从常规的bytestring到unicode和编码,就像你从unicode中得到的一样。换句话说:

You de - code a str to produce a unicode string

您需要编码一个str来生成一个unicode字符串。

and en - code a unicode string to produce an str.

并且en -编码一个unicode字符串来产生一个str。

So:

所以:

unicode_char = u'\xb0'

encodedchar = unicode_char.encode('utf-8')

encodedchar will contain your unicode character, displayed in the selected encoding (in this case, utf-8).

encodedchar将包含您的unicode字符,显示在所选的编码中(在本例中是utf-8)。

#2


16  

From http://wiki.python.org/moin/UnicodeEncodeError

从http://wiki.python.org/moin/UnicodeEncodeError

Paradoxically, a UnicodeEncodeError may happen when decoding. The cause of it seems to be the coding-specific decode() functions that normally expect a parameter of type str. It appears that on seeing a unicode parameter, the decode() functions "down-convert" it into str, then decode the result assuming it to be of their own coding. It also appears that the "down-conversion" is performed using the ASCII encoder. Hence an encoding failure inside a decoder.

自相矛盾的是,当解码时,可能会出现一个UnicodeEncodeError。它的原因似乎是特定于编码的decode()函数,它通常期望一个类型为str的参数,它似乎是在看到unicode参数时,decode()函数将它“向下转换”到str,然后将其解码为自己的编码。它还显示“下转换”是使用ASCII编码器执行的。因此,在解码器内部有一个编码失败。

#3


3  

You're trying to decode a unicode. The implicit encoding to make the decode work is what's failing.

你试图解码一个unicode。使解码工作的隐式编码是失败的。

#1


59  

When I first started messing around with python strings and unicode, It took me awhile to understand the jargon of decode and encode too, so here's my post from here that may help:

当我开始摆弄python字符串和unicode时,我花了一段时间来理解解码和编码的术语,所以这是我在这里的文章可能会有所帮助:


Think of decoding as what you do to go from a regular bytestring to unicode and encoding as what you do to get back from unicode. In other words:

想想你所做的事情,从常规的bytestring到unicode和编码,就像你从unicode中得到的一样。换句话说:

You de - code a str to produce a unicode string

您需要编码一个str来生成一个unicode字符串。

and en - code a unicode string to produce an str.

并且en -编码一个unicode字符串来产生一个str。

So:

所以:

unicode_char = u'\xb0'

encodedchar = unicode_char.encode('utf-8')

encodedchar will contain your unicode character, displayed in the selected encoding (in this case, utf-8).

encodedchar将包含您的unicode字符,显示在所选的编码中(在本例中是utf-8)。

#2


16  

From http://wiki.python.org/moin/UnicodeEncodeError

从http://wiki.python.org/moin/UnicodeEncodeError

Paradoxically, a UnicodeEncodeError may happen when decoding. The cause of it seems to be the coding-specific decode() functions that normally expect a parameter of type str. It appears that on seeing a unicode parameter, the decode() functions "down-convert" it into str, then decode the result assuming it to be of their own coding. It also appears that the "down-conversion" is performed using the ASCII encoder. Hence an encoding failure inside a decoder.

自相矛盾的是,当解码时,可能会出现一个UnicodeEncodeError。它的原因似乎是特定于编码的decode()函数,它通常期望一个类型为str的参数,它似乎是在看到unicode参数时,decode()函数将它“向下转换”到str,然后将其解码为自己的编码。它还显示“下转换”是使用ASCII编码器执行的。因此,在解码器内部有一个编码失败。

#3


3  

You're trying to decode a unicode. The implicit encoding to make the decode work is what's failing.

你试图解码一个unicode。使解码工作的隐式编码是失败的。