My string is Niệm Bồ Tát (Thiá»n sÆ° Nhất Hạnh)
and I wanna decode it to Niệm Bồ Tát (Thiền sư Nhất Hạnh)
I see in that site can do that http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx
我的字符串是NiệmBồTát(Thiá»nsÆ°NhấtHạnh),我想把它解码为NiệmBồTát(ThiềnsưNhấtHạnh)我看到那个网站可以做那个http ://www.enderminh.com/minh/utf8-to-unicode-converter.aspx
and I start to try by Python
我开始尝试使用Python
mystr = '09. Bát Nhã Tâm Kinh'
mystr.decode('utf-8')
but actually it is not correct because original string is utf-8 but the string show is not my expecting result.
但实际上它不正确,因为原始字符串是utf-8但字符串显示不是我期望的结果。
Note: it is Vietnamese character.
注意:它是越南字符。
How to resolve that case? Is that Windows Unicode or something? How to detect the encoding here. Thanks in advance
如何解决这个案子?是Windows Unicode还是什么?如何在这里检测编码。提前致谢
2 个解决方案
#1
8
I'm not sure what you can do with these kind of data, but for your example in your original post, this works:
我不确定你能用这些数据做些什么,但对于你在原帖中的例子,这有效:
>>> mystr = '09. Bát Nhã Tâm Kinh'
>>> s = mystr.decode('utf8').encode('latin1').decode('utf8')
>>> s
u'09. B\xe1t Nh\xe3 T\xe2m Kinh'
>>> print(s)
09. Bát Nhã Tâm Kinh
#2
8
The only thing that helped me with broken cyrillic string - https://github.com/LuminosoInsight/python-ftfy
唯一帮助我破解西里尔字符串的东西 - https://github.com/LuminosoInsight/python-ftfy
This module fixes pretty much everything and works much better than online decoders.
该模块几乎可以修复所有内容,并且比在线解码器更好地工作。
>>> from ftfy import fix_encoding
>>> mystr = '09. Bát Nhã Tâm Kinh'
>>> fix_encoding(mystr)
'09. Bát Nhã Tâm Kinh'
It can be easily installed using pip install ftfy
它可以使用pip install ftfy轻松安装
#1
8
I'm not sure what you can do with these kind of data, but for your example in your original post, this works:
我不确定你能用这些数据做些什么,但对于你在原帖中的例子,这有效:
>>> mystr = '09. Bát Nhã Tâm Kinh'
>>> s = mystr.decode('utf8').encode('latin1').decode('utf8')
>>> s
u'09. B\xe1t Nh\xe3 T\xe2m Kinh'
>>> print(s)
09. Bát Nhã Tâm Kinh
#2
8
The only thing that helped me with broken cyrillic string - https://github.com/LuminosoInsight/python-ftfy
唯一帮助我破解西里尔字符串的东西 - https://github.com/LuminosoInsight/python-ftfy
This module fixes pretty much everything and works much better than online decoders.
该模块几乎可以修复所有内容,并且比在线解码器更好地工作。
>>> from ftfy import fix_encoding
>>> mystr = '09. Bát Nhã Tâm Kinh'
>>> fix_encoding(mystr)
'09. Bát Nhã Tâm Kinh'
It can be easily installed using pip install ftfy
它可以使用pip install ftfy轻松安装