So, I have this code to fetch JSON string from url
因此,我有了从url获取JSON字符串的代码。
url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)
for x in data:
print x['foo']
The problem is x['foo']
, if tried to print it as seen above, I get this error.
问题是x['foo'],如果试图将其打印到上面,我就会得到这个错误。
Warning: Incorrect string value: '\xE4\xB8\xBA Co...' for column 'description' at row 1
警告:不正确的字符串值:'\xE4\xB8\xBA Co…“在第1行”列“描述”。
If I use x['foo'].decode("utf-8")
I get this error:
如果我使用x['foo'].decode(“utf-8”),我就会得到这个错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a' in position 0: ordinal not in range(128)
UnicodeEncodeError:“ascii”编解码器不能在位置0中编码字符u'\u4e3a',顺序不在范围(128)
If I try, encode('ascii', 'ignore').decode('ascii')
Then I get this error.
如果我尝试,编码('ascii', '忽略').decode('ascii'),然后我得到这个错误。
x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode'
x(“foo”)。编码('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType'对象没有属性'encode'
Is there any way to fix this problem?
有没有办法解决这个问题?
1 个解决方案
#1
2
x['foo'].decode("utf-8")
resulting in UnicodeEncodeError
means that x['foo']
is of type unicode
. str.decode
takes a str
type and translates it to unicode
type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode
to str
so that you can call decode
on it. It does this with sys.defaultencoding
, which is ascii
, which can't encode all of Unicode, hence the exception.
x['foo'].decode("utf-8")导致的UnicodeEncodeError表示x['foo']是unicode的类型。解码采用str类型,并将其转换为unicode类型。Python 2试图在这里提供帮助,并试图隐式地将您的unicode转换为str,以便您可以在它上调用decode。它使用sys.defaultencoding,即ascii,它不能对所有Unicode编码,因此有例外。
The solution here is to remove the decode
call - the value is already unicode
.
这里的解决方案是删除解码调用——该值已经是unicode了。
Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.
阅读Ned Batchelder的演示——实用的Unicode——它将极大地增强您对这一点的理解,并有助于防止将来出现类似的错误。
It's worth noting here that everything returned by json.load
will be unicode
and not str
.
这里值得注意的是json返回的所有内容。load将是unicode而不是str。
Addressing the new question after edits:
编辑后的新问题:
When you print
, you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode
object to str
. You can do this be calling encode
with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.
当您打印时,您需要字节- unicode是一个抽象概念。您需要从抽象的unicode字符串到字节的映射——在python术语中,您必须将unicode对象转换为str。通常,您希望使用utf-8编码。
This should work:
这应该工作:
print x['foo'].encode('utf-8')
#1
2
x['foo'].decode("utf-8")
resulting in UnicodeEncodeError
means that x['foo']
is of type unicode
. str.decode
takes a str
type and translates it to unicode
type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode
to str
so that you can call decode
on it. It does this with sys.defaultencoding
, which is ascii
, which can't encode all of Unicode, hence the exception.
x['foo'].decode("utf-8")导致的UnicodeEncodeError表示x['foo']是unicode的类型。解码采用str类型,并将其转换为unicode类型。Python 2试图在这里提供帮助,并试图隐式地将您的unicode转换为str,以便您可以在它上调用decode。它使用sys.defaultencoding,即ascii,它不能对所有Unicode编码,因此有例外。
The solution here is to remove the decode
call - the value is already unicode
.
这里的解决方案是删除解码调用——该值已经是unicode了。
Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.
阅读Ned Batchelder的演示——实用的Unicode——它将极大地增强您对这一点的理解,并有助于防止将来出现类似的错误。
It's worth noting here that everything returned by json.load
will be unicode
and not str
.
这里值得注意的是json返回的所有内容。load将是unicode而不是str。
Addressing the new question after edits:
编辑后的新问题:
When you print
, you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode
object to str
. You can do this be calling encode
with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.
当您打印时,您需要字节- unicode是一个抽象概念。您需要从抽象的unicode字符串到字节的映射——在python术语中,您必须将unicode对象转换为str。通常,您希望使用utf-8编码。
This should work:
这应该工作:
print x['foo'].encode('utf-8')