I have a code such that:
a = "\u0432"
b = u"\u0432"
c = b"\u0432"
d = c.decode('utf8')
print(type(a), a)
print(type(b), b)
print(type(c), c)
print(type(d), d)
And output:
<class 'str'> в
<class 'str'> в
<class 'bytes'> b'\\u0432'
<class 'str'> \u0432
Why in the latter case I see a character code, instead of the character? How I can transform Byte string to Unicode string that in case of an output I saw the character, instead of its code?
2 个解决方案
In strings (or Unicode objects in Python 2), \u
has a special meaning, namely saying, "here comes a Unicode character specified by it's Unicode ID". Hence u"\u0432"
will result in the character в.
在字符串(或Python 2中的Unicode对象)中,\ u有一个特殊含义,即说“这里有一个由Unicode ID指定的Unicode字符”。因此u“\ u0432”将导致角色в。
The b''
prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the \u
code has no special meaning. Hence, b"\u0432"
is just the sequence of the bytes \
and 2
b''前缀告诉你这是一个8位字节的序列,而bytes对象没有Unicode字符,所以\ u代码没有特殊含义。因此,b“\ u0432”只是字节\,u,0,4,3和2的序列。
Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.
You can convert this specification using the unicode escape encoder.
>>> c.decode('unicode_escape')
Loved Lennart's answer. It put me on the right track for solving the particular problem I had faced. What I added was the ability to produce html-compatible code for \u???? specifications in strings. Basically, only one line was needed:
喜欢Lennart的回答。它让我走上正确的轨道,解决我遇到的特殊问题。我添加的是为\ u ????生成html兼容代码的能力字符串中的规范。基本上,只需要一行:
results = results.replace('\\u','&#x')
This all came about from a need to convert JSON results to something that displays well in a browser. Here is some test code that is integrated with a cloud application:
# References:
# http://*.com/questions/9746303/how-do-i-send-a-post-request-as-a-json
# https://docs.python.org/3/library/http.client.html
# http://docs.python-requests.org/en/v0.10.7/user/quickstart/#custom-headers
# http://*.com/questions/606191/convert-bytes-to-a-python-string
# http://www.w3schools.com/charsets/ref_utf_punctuation.asp
# http://*.com/questions/13837848/converting-byte-string-in-unicode-string
import urllib.request
import json
body = [ { "query": "co-development and language.name:English", "page": 1, "pageSize": 100 } ]
myurl = "https://core.ac.uk:443/api-v2/articles/search?metadata=true&fulltext=false&citations=false&similar=false&duplicate=false&urls=true&extractedUrls=false&faithfulMetadata=false&apiKey=SZYoqzk0Vx5QiEATgBPw1b842uypeXUv"
req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondatabytes = jsondata.encode('utf-8') # needs to be bytes
req.add_header('Content-Length', len(jsondatabytes))
print ('\n', jsondatabytes, '\n')
response = urllib.request.urlopen(req, jsondatabytes)
results = response.read()
results = results.decode('utf-8')
results = results.replace('\\u','&#x') # produces html hex version of \u???? unicode characters
In strings (or Unicode objects in Python 2), \u
has a special meaning, namely saying, "here comes a Unicode character specified by it's Unicode ID". Hence u"\u0432"
will result in the character в.
在字符串(或Python 2中的Unicode对象)中,\ u有一个特殊含义,即说“这里有一个由Unicode ID指定的Unicode字符”。因此u“\ u0432”将导致角色в。
The b''
prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the \u
code has no special meaning. Hence, b"\u0432"
is just the sequence of the bytes \
and 2
b''前缀告诉你这是一个8位字节的序列,而bytes对象没有Unicode字符,所以\ u代码没有特殊含义。因此,b“\ u0432”只是字节\,u,0,4,3和2的序列。
Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.
You can convert this specification using the unicode escape encoder.
>>> c.decode('unicode_escape')
Loved Lennart's answer. It put me on the right track for solving the particular problem I had faced. What I added was the ability to produce html-compatible code for \u???? specifications in strings. Basically, only one line was needed:
喜欢Lennart的回答。它让我走上正确的轨道,解决我遇到的特殊问题。我添加的是为\ u ????生成html兼容代码的能力字符串中的规范。基本上,只需要一行:
results = results.replace('\\u','&#x')
This all came about from a need to convert JSON results to something that displays well in a browser. Here is some test code that is integrated with a cloud application:
# References:
# http://*.com/questions/9746303/how-do-i-send-a-post-request-as-a-json
# https://docs.python.org/3/library/http.client.html
# http://docs.python-requests.org/en/v0.10.7/user/quickstart/#custom-headers
# http://*.com/questions/606191/convert-bytes-to-a-python-string
# http://www.w3schools.com/charsets/ref_utf_punctuation.asp
# http://*.com/questions/13837848/converting-byte-string-in-unicode-string
import urllib.request
import json
body = [ { "query": "co-development and language.name:English", "page": 1, "pageSize": 100 } ]
myurl = "https://core.ac.uk:443/api-v2/articles/search?metadata=true&fulltext=false&citations=false&similar=false&duplicate=false&urls=true&extractedUrls=false&faithfulMetadata=false&apiKey=SZYoqzk0Vx5QiEATgBPw1b842uypeXUv"
req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondatabytes = jsondata.encode('utf-8') # needs to be bytes
req.add_header('Content-Length', len(jsondatabytes))
print ('\n', jsondatabytes, '\n')
response = urllib.request.urlopen(req, jsondatabytes)
results = response.read()
results = results.decode('utf-8')
results = results.replace('\\u','&#x') # produces html hex version of \u???? unicode characters