Python unicode错误。UnicodeEncodeError:“ascii”编码解码器无法编码字符u'\u4e3a'

时间:2023-01-06 13:26:01

So, I have this code to fetch JSON string from url

因此,我有了从url获取JSON字符串的代码。

url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)

for x in data: 
    print x['foo']

The problem is x['foo'], if tried to print it as seen above, I get this error.

问题是x['foo'],如果试图将其打印到上面,我就会得到这个错误。

Warning: Incorrect string value: '\xE4\xB8\xBA Co...' for column 'description' at row 1

警告:不正确的字符串值:'\xE4\xB8\xBA Co…“在第1行”列“描述”。

If I use x['foo'].decode("utf-8") I get this error:

如果我使用x['foo'].decode(“utf-8”),我就会得到这个错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a' in position 0: ordinal not in range(128)

UnicodeEncodeError:“ascii”编解码器不能在位置0中编码字符u'\u4e3a',顺序不在范围(128)

If I try, encode('ascii', 'ignore').decode('ascii') Then I get this error.

如果我尝试,编码('ascii', '忽略').decode('ascii'),然后我得到这个错误。

x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode'

x(“foo”)。编码('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType'对象没有属性'encode'

Is there any way to fix this problem?

有没有办法解决这个问题?

1 个解决方案

#1


2  

x['foo'].decode("utf-8") resulting in UnicodeEncodeError means that x['foo'] is of type unicode. str.decode takes a str type and translates it to unicode type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode to str so that you can call decode on it. It does this with sys.defaultencoding, which is ascii, which can't encode all of Unicode, hence the exception.

x['foo'].decode("utf-8")导致的UnicodeEncodeError表示x['foo']是unicode的类型。解码采用str类型,并将其转换为unicode类型。Python 2试图在这里提供帮助,并试图隐式地将您的unicode转换为str,以便您可以在它上调用decode。它使用sys.defaultencoding,即ascii,它不能对所有Unicode编码,因此有例外。

The solution here is to remove the decode call - the value is already unicode.

这里的解决方案是删除解码调用——该值已经是unicode了。

Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.

阅读Ned Batchelder的演示——实用的Unicode——它将极大地增强您对这一点的理解,并有助于防止将来出现类似的错误。

It's worth noting here that everything returned by json.load will be unicode and not str.

这里值得注意的是json返回的所有内容。load将是unicode而不是str。


Addressing the new question after edits:

编辑后的新问题:

When you print, you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode object to str. You can do this be calling encode with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.

当您打印时,您需要字节- unicode是一个抽象概念。您需要从抽象的unicode字符串到字节的映射——在python术语中,您必须将unicode对象转换为str。通常,您希望使用utf-8编码。

This should work:

这应该工作:

print x['foo'].encode('utf-8')

#1


2  

x['foo'].decode("utf-8") resulting in UnicodeEncodeError means that x['foo'] is of type unicode. str.decode takes a str type and translates it to unicode type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode to str so that you can call decode on it. It does this with sys.defaultencoding, which is ascii, which can't encode all of Unicode, hence the exception.

x['foo'].decode("utf-8")导致的UnicodeEncodeError表示x['foo']是unicode的类型。解码采用str类型,并将其转换为unicode类型。Python 2试图在这里提供帮助,并试图隐式地将您的unicode转换为str,以便您可以在它上调用decode。它使用sys.defaultencoding,即ascii,它不能对所有Unicode编码,因此有例外。

The solution here is to remove the decode call - the value is already unicode.

这里的解决方案是删除解码调用——该值已经是unicode了。

Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.

阅读Ned Batchelder的演示——实用的Unicode——它将极大地增强您对这一点的理解,并有助于防止将来出现类似的错误。

It's worth noting here that everything returned by json.load will be unicode and not str.

这里值得注意的是json返回的所有内容。load将是unicode而不是str。


Addressing the new question after edits:

编辑后的新问题:

When you print, you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode object to str. You can do this be calling encode with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.

当您打印时,您需要字节- unicode是一个抽象概念。您需要从抽象的unicode字符串到字节的映射——在python术语中,您必须将unicode对象转换为str。通常,您希望使用utf-8编码。

This should work:

这应该工作:

print x['foo'].encode('utf-8')