转换\uXXXX
if Python3.x:
str.decode
no longer exists in 3.x. that']s whyPython 3.4: str : AttributeError: 'str' object has no attribute 'decode
is thrown.- Unicode literal string
'\uxxxx\uxxxx'
is different from string'\uxxxx\uxxxx'
.
if you don't understand what liternal means, check the py3.x ducumentation
./descape.py '\u627e\u4e0d\u5230\u8be5\u8bcd\u7684\u89e3\u91ca'
#!/usr/bin/env python3
# file : descape.py
# convert the escaped chars like `\u45e3` to unicode
import sys, re
def h2d(a):
if len(a) != 4:
return False
j = 16 ** 3
r = 0
for i in range(0,len(a)):
b = ord(a[i])- 48
r += (b-39 if b > 9 else b) * j
j //= 16
return chr(r)
text = sys.argv[1]
# text is string. not unicode literals
def descape(utext):
o = ''
for ac in re.split(r'\\u([a-f0-9]{4})',text):
if not ac or len(ac) != 4:
continue
cur = ac
o += h2d(cur)
return o
print(descape(text))
json module
json.dumps()
和json.dump()
有一个参数ensure_ascii
默认是True
,改为False
就不会把汉字编码成\uxxxx了
References: