This doesn't appear to be possible to me using the standard library json
module. When using json.dumps
it will automatically escape all non-ASCII characters then encode the string to ASCII. I can specify that it not escape non-ASCII characters, but then it crashes when it tries to convert the output to ASCII.
在我看来,使用标准库json模块是不可能的。当使用json。转储文件将自动转义所有非ASCII字符,然后将字符串编码为ASCII。我可以指定它不是转义非ASCII字符,但是当它试图将输出转换为ASCII时,它会崩溃。
The problem is - I don't want ASCII! I just want my JSON string back as a unicode (or UTF-8) string. Are there any convenient ways to do that?
问题是——我不想要ASCII!我只想让JSON字符串返回为unicode(或UTF-8)字符串。有什么方便的方法吗?
Here's an example to demonstrate what I want:
这里有一个例子来说明我想要什么:
d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d, output_encoding='utf8')
# => '{"stilling": "Lærling", "navn": "Åge"}'
But of course, there is no such option as output_encoding, so here's the actual output:
但是当然,没有output_encoding这样的选项,所以这是实际的输出:
d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d)
# => '{"stilling": "L\\u00e6rling", "navn": "\\u00c5ge"}'
So to summarize - I want to convert a Python dict to an UTF-8 JSON string without any escapes. How can I do that?
总结一下,我想把Python词典转换成UTF-8 JSON字符串,没有任何转义。我怎么做呢?
I'll accept solutions like:
我接受的解决方案:
- Hacks (pre- and post processing input to
dumps
to achieve the desired effect) - hack(处理前和处理后输入到转储以实现所需的效果)
- Subclassing the JSONEncoder (I have no idea how it works and the documentation isn't very helpful)
- 子类化JSONEncoder(我不知道它是如何工作的,文档也不是很有用)
- Third party libraries available on PyPi
- PyPi上的第三方库。
2 个解决方案
#1
5
Requirements
-
Make sure your python files are encoded in UTF-8. Or else your non-ascii characters will become question marks,
?
. Notepad++ has excellent encoding options for this.确保您的python文件是用UTF-8编码的。否则你的非ascii字符会变成问号?Notepad++具有出色的编码选项。
-
Make sure that you have the appropriate fonts included. If you want to display Japanese characters then you need to install Japanese fonts.
确保包含适当的字体。如果要显示日文字符,则需要安装日文字体。
-
Make sure that your IDE supports displaying unicode characters. Otherwise you might get an
UnicodeEncodeError
error thrown.确保您的IDE支持显示unicode字符。否则您可能会得到一个UnicodeEncodeError抛出。
Example:
例子:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 22-23: character maps to <undefined>
PyScripter works for me. It's included with "Portable Python" at http://portablepython.com/wiki/PortablePython3.2.1.1
PyScripter为我工作。它包含在http://portablepython.com/wiki/PortablePython3.2.1.1中的“便携Python”中
- Make sure you're using Python 3+, since this version offers better unicode support.
- 请确保您正在使用Python 3+,因为这个版本提供了更好的unicode支持。
Problem
json.dumps() escapes unicode characters.
json.dumps()逃unicode字符。
Solution
Read the update at the bottom. Or...
阅读底部的更新。还是……
Replace each escaped characters with the parsed unicode character.
用解析过的unicode字符替换每个转义字符。
I created a simple lambda function called getStringWithDecodedUnicode
that does just that.
我创建了一个简单的lambda函数getStringWithDecodedUnicode就是这样做的。
import re
getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )
Here's getStringWithDecodedUnicode
as a regular function.
这是一个常规函数getStringWithDecodedUnicode。
def getStringWithDecodedUnicode( value ):
findUnicodeRE = re.compile( '\\\\u([\da-f]{4})' )
def getParsedUnicode(x):
return chr( int( x.group(1), 16 ) )
return findUnicodeRE.sub(getParsedUnicode, str( value ) )
Example
testJSONWithUnicode.py (Using PyScripter as the IDE)
import re
import json
getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )
data = {"Japan":"日本"}
jsonString = json.dumps( data )
print( "json.dumps({0}) = {1}".format( data, jsonString ) )
jsonString = getStringWithDecodedUnicode( jsonString )
print( "Decoded Unicode: %s" % jsonString )
Output
json.dumps({'Japan': '日本'}) = {"Japan": "\u65e5\u672c"}
Decoded Unicode: {"Japan": "日本"}
Update
Or... just pass ensure_ascii=False
as an option for json.dumps.
还是……只需将ensure_ascii=False作为json.dumps的选项。
Note: You need to meet the requirements that I outlined at the beginning or else this isn't going to work.
注意:您需要满足我在开始时概述的需求,否则这将不起作用。
import json
data = {'navn': 'Åge', 'stilling': 'Lærling'}
result = json.dumps(d, ensure_ascii=False)
print( result ) # prints '{"stilling": "Lærling", "navn": "Åge"}'
#2
6
encode_ascii=False
is the best solution IMHO.
encode_ascii=False是最好的解决方案。
If you are using Python2.7, here is example python file :
如果您正在使用Python2.7,下面是一个python文件示例:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# example.py
from __future__ import unicode_literals
from json import dumps as json_dumps
d = {'navn': 'Åge', 'stilling': 'Lærling'}
print json_dumps(d, ensure_ascii=False).encode('utf-8')
#1
5
Requirements
-
Make sure your python files are encoded in UTF-8. Or else your non-ascii characters will become question marks,
?
. Notepad++ has excellent encoding options for this.确保您的python文件是用UTF-8编码的。否则你的非ascii字符会变成问号?Notepad++具有出色的编码选项。
-
Make sure that you have the appropriate fonts included. If you want to display Japanese characters then you need to install Japanese fonts.
确保包含适当的字体。如果要显示日文字符,则需要安装日文字体。
-
Make sure that your IDE supports displaying unicode characters. Otherwise you might get an
UnicodeEncodeError
error thrown.确保您的IDE支持显示unicode字符。否则您可能会得到一个UnicodeEncodeError抛出。
Example:
例子:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 22-23: character maps to <undefined>
PyScripter works for me. It's included with "Portable Python" at http://portablepython.com/wiki/PortablePython3.2.1.1
PyScripter为我工作。它包含在http://portablepython.com/wiki/PortablePython3.2.1.1中的“便携Python”中
- Make sure you're using Python 3+, since this version offers better unicode support.
- 请确保您正在使用Python 3+,因为这个版本提供了更好的unicode支持。
Problem
json.dumps() escapes unicode characters.
json.dumps()逃unicode字符。
Solution
Read the update at the bottom. Or...
阅读底部的更新。还是……
Replace each escaped characters with the parsed unicode character.
用解析过的unicode字符替换每个转义字符。
I created a simple lambda function called getStringWithDecodedUnicode
that does just that.
我创建了一个简单的lambda函数getStringWithDecodedUnicode就是这样做的。
import re
getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )
Here's getStringWithDecodedUnicode
as a regular function.
这是一个常规函数getStringWithDecodedUnicode。
def getStringWithDecodedUnicode( value ):
findUnicodeRE = re.compile( '\\\\u([\da-f]{4})' )
def getParsedUnicode(x):
return chr( int( x.group(1), 16 ) )
return findUnicodeRE.sub(getParsedUnicode, str( value ) )
Example
testJSONWithUnicode.py (Using PyScripter as the IDE)
import re
import json
getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )
data = {"Japan":"日本"}
jsonString = json.dumps( data )
print( "json.dumps({0}) = {1}".format( data, jsonString ) )
jsonString = getStringWithDecodedUnicode( jsonString )
print( "Decoded Unicode: %s" % jsonString )
Output
json.dumps({'Japan': '日本'}) = {"Japan": "\u65e5\u672c"}
Decoded Unicode: {"Japan": "日本"}
Update
Or... just pass ensure_ascii=False
as an option for json.dumps.
还是……只需将ensure_ascii=False作为json.dumps的选项。
Note: You need to meet the requirements that I outlined at the beginning or else this isn't going to work.
注意:您需要满足我在开始时概述的需求,否则这将不起作用。
import json
data = {'navn': 'Åge', 'stilling': 'Lærling'}
result = json.dumps(d, ensure_ascii=False)
print( result ) # prints '{"stilling": "Lærling", "navn": "Åge"}'
#2
6
encode_ascii=False
is the best solution IMHO.
encode_ascii=False是最好的解决方案。
If you are using Python2.7, here is example python file :
如果您正在使用Python2.7,下面是一个python文件示例:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# example.py
from __future__ import unicode_literals
from json import dumps as json_dumps
d = {'navn': 'Åge', 'stilling': 'Lærling'}
print json_dumps(d, ensure_ascii=False).encode('utf-8')