Unfortunately I'm working in an obscure platform called uniPaaS so I'm probably after some platform-agnostic advice.
不幸的是,我在一个叫uniPaaS的不知名的平台上工作,所以我很可能在某种平台无关的建议之后。
I've got a Web Service request where the XML document contains those irritating smart quotes. The byte data for the character is E2 80 99
(which is a 00002019 RIGHT SINGLE QUOTATION MARK
)
我有一个Web服务请求,其中XML文档包含那些恼人的智能引用。字符的字节数据是e28099(是00002019右单引号)
When I write the XML file to disk on our staging server, it writes it correctly. When I write it on our production server, it totally changes the values of those bytes and malforms the XML document:
当我在staging服务器上将XML文件写到磁盘时,它会正确地编写它。当我在我们的生产服务器上写它时,它完全改变了这些字节的值,并使XML文档的格式变得不正确:
E2 80 99
becomes 92
. Has anyone ever seen this sort of behaviour before? It seems to only be that one byte string (but the SOAP resonse is 50Mb large, so I haven't had a chance to diff the entire file).
e28099变成92。有人见过这种行为吗?看起来只有一个字节的字符串(但是SOAP共振是50Mb大的,所以我没有机会删除整个文件)。
1 个解决方案
#1
6
It's encoding it as CP1251.
它编码为CP1251。
>>> '\x92'.decode('cp1251').encode('utf-8')
'\xe2\x80\x99'
#1
6
It's encoding it as CP1251.
它编码为CP1251。
>>> '\x92'.decode('cp1251').encode('utf-8')
'\xe2\x80\x99'