Have s = u'Gaga\xe2\x80\x99s'
but need to convert to t = u'Gaga\u2019s'
有s = u'Gaga \ xe2 \ x80 \ x99s',但需要转换为t = u'Gaga \ u2019s'
How can this be best achieved?
如何才能最好地实现这一目标?
3 个解决方案
#1
7
Where ever you decoded the original string, it was likely decoded with latin-1 or a close relative. Since latin-1 is the first 256 codepoints of Unicode, this works:
无论你解码原始字符串,都可能用latin-1或近亲解码。由于latin-1是Unicode的前256个代码点,因此可以:
>>> s = u'Gaga\xe2\x80\x99s'
>>> s.encode('latin-1').decode('utf8')
u'Gaga\u2019s'
#2
8
s = u'Gaga\xe2\x80\x99s'
t = u'Gaga\u2019s'
x = s.encode('raw-unicode-escape').decode('utf-8')
assert x==t
print(x)
yields
Gaga’s
#3
2
import codecs
s = u"Gaga\xe2\x80\x99s"
s_as_str = codecs.charmap_encode(s)[0]
t = unicode(s_as_str, "utf-8")
print t
prints
u'Gaga\u2019s'
#1
7
Where ever you decoded the original string, it was likely decoded with latin-1 or a close relative. Since latin-1 is the first 256 codepoints of Unicode, this works:
无论你解码原始字符串,都可能用latin-1或近亲解码。由于latin-1是Unicode的前256个代码点,因此可以:
>>> s = u'Gaga\xe2\x80\x99s'
>>> s.encode('latin-1').decode('utf8')
u'Gaga\u2019s'
#2
8
s = u'Gaga\xe2\x80\x99s'
t = u'Gaga\u2019s'
x = s.encode('raw-unicode-escape').decode('utf-8')
assert x==t
print(x)
yields
Gaga’s
#3
2
import codecs
s = u"Gaga\xe2\x80\x99s"
s_as_str = codecs.charmap_encode(s)[0]
t = unicode(s_as_str, "utf-8")
print t
prints
u'Gaga\u2019s'