XML UTF-8数据的编写方式不同

时间:2022-10-24 22:30:12

Unfortunately I'm working in an obscure platform called uniPaaS so I'm probably after some platform-agnostic advice.

不幸的是,我在一个叫uniPaaS的不知名的平台上工作,所以我很可能在某种平台无关的建议之后。

I've got a Web Service request where the XML document contains those irritating smart quotes. The byte data for the character is E2 80 99 (which is a 00002019 RIGHT SINGLE QUOTATION MARK)

我有一个Web服务请求,其中XML文档包含那些恼人的智能引用。字符的字节数据是e28099(是00002019右单引号)

XML UTF-8数据的编写方式不同

When I write the XML file to disk on our staging server, it writes it correctly. When I write it on our production server, it totally changes the values of those bytes and malforms the XML document:

当我在staging服务器上将XML文件写到磁盘时,它会正确地编写它。当我在我们的生产服务器上写它时,它完全改变了这些字节的值,并使XML文档的格式变得不正确:

XML UTF-8数据的编写方式不同

E2 80 99 becomes 92. Has anyone ever seen this sort of behaviour before? It seems to only be that one byte string (but the SOAP resonse is 50Mb large, so I haven't had a chance to diff the entire file).

e28099变成92。有人见过这种行为吗?看起来只有一个字节的字符串(但是SOAP共振是50Mb大的,所以我没有机会删除整个文件)。

1 个解决方案

#1


6  

It's encoding it as CP1251.

它编码为CP1251。

>>> '\x92'.decode('cp1251').encode('utf-8')
'\xe2\x80\x99'

#1


6  

It's encoding it as CP1251.

它编码为CP1251。

>>> '\x92'.decode('cp1251').encode('utf-8')
'\xe2\x80\x99'