JSON字符编码——浏览器是否支持UTF-8,或者我是否应该使用数字转义序列?

时间:2021-12-02 00:19:02

I am writing a webservice that uses json to represent its resources, and I am a bit stuck thinking about the best way to encode the json. Reading the json rfc (http://www.ietf.org/rfc/rfc4627.txt) it is clear that the preferred encoding is utf-8. But the rfc also describes a string escaping mechanism for specifying characters. I assume this would generally be used to escape non-ascii characters, thereby making the resulting utf-8 valid ascii.

我正在编写一个使用json表示其资源的web服务,我有点纠结于如何最好地编码json。读取json rfc (http://www.ietf.org/rfc/rfc4627.txt)很明显,首选的编码是utf-8。但是rfc还描述了用于指定字符的字符串转义机制。我假设这通常用于转义非ascii字符,从而使得到的utf-8成为有效的ascii。

So let's say I have a json string that contains unicode characters (code-points) that are non-ascii. Should my webservice just utf-8 encoding that and return it, or should it escape all those non-ascii characters and return pure ascii?

假设我有一个json字符串,它包含非ascii的unicode字符(代码点)。我的webservice应该只编码utf-8并返回它,还是应该转义所有的非ascii字符并返回纯ascii?

I'd like browsers to be able to execute the results using jsonp or eval. Does that effect the decision? My knowledge of various browser's javascript support for utf-8 is lacking.

我希望浏览器能够使用jsonp或eval执行结果。这会影响决策吗?我不了解各种浏览器对utf-8的javascript支持。

EDIT: I wanted to clarify that my main concern about how to encode the results is really about browser handling of the results. What I've read indicates that browsers may be sensitive to the encoding when using JSONP in particular. I haven't found any really good info on the subject, so I'll have to start doing some testing to see what happens. Ideally I'd like to only escape those few characters that are required and just utf-8 encode the results.

编辑:我想澄清的是,我主要关心的是如何对结果进行编码,实际上是关于浏览器对结果的处理。我读到的内容表明浏览器在使用JSONP时可能对编码很敏感。我还没有找到任何关于这个主题的很好的信息,所以我必须开始做一些测试,看看会发生什么。理想情况下,我希望只转义需要的少数字符,而只对结果进行utf-8编码。

5 个解决方案

#1


66  

All JSON parsers can handle proper UTF-8 just as well as the numeric escape sequences, as the JSON specification requires.

所有JSON解析器都可以处理适当的UTF-8和数字转义序列,这是JSON规范所要求的。

The ability for JSON encoders to use the numeric escape sequences instead just offers you more choice. One reason you may choose the numeric escape sequences would be if a transport mechanism in between your encoder and the intended decoder is not binary-safe.

JSON编码器使用数字转义序列的能力提供了更多的选择。选择数字转义序列的一个原因是,在编码器和预期解码器之间的传输机制不是二进制安全的。

Another reason you may numeric escape sequences is to prevent certain characters appearing in the stream, such as <, & and ", which may be interpreted as HTML sequences if the JSON code is placed without escaping into HTML or a browser wrongly interprets it as HTML. This can be a defence against HTML injection or cross-site scripting (note: some characters MUST be escaped in JSON, including " and \).

数字转义序列的另一个原因是防止出现在流中的某些字符,如<、&和",如果JSON代码没有转义到HTML中,或者浏览器错误地将其解释为HTML,那么这些字符可能被解释为HTML序列。这可以防止HTML注入或跨站点脚本(注意:有些字符必须以JSON形式转义,包括“和\”)。

Some frameworks, including PHP's implementation of JSON, always do the numeric escape sequences on the encoder side for any character outside of ASCII. This is intended for maximum compatibility with limited transport mechanisms and the like. However, this should not be interpreted as an indication that JSON decoders have a problem with UTF-8.

有些框架,包括PHP的JSON实现,总是在编码器端为ASCII之外的任何字符执行数字转义序列。这是为了最大限度地兼容有限的传输机制等。但是,这并不意味着JSON解码器有UTF-8问题。

So, I guess you just could decide which to use like this:

所以,我猜你可以这样决定用哪个:

  • Just use UTF-8, unless your method of storage or transport between encoder and decoder is not binary-safe.

    只要使用UTF-8,除非您在编码器和解码器之间的存储或传输方法不是二进制安全的。

  • Otherwise, use the numeric escape sequences.

    否则,使用数字转义序列。

#2


14  

I had a problem there. When I JSON encode a string with a character like "é", every browsers will return the same "é", except IE which will return "\u00e9".

我有个问题。当我用“e”这样的字符编码一个字符串时,每个浏览器都会返回相同的“e”,除了IE会返回“\u00e9”。

Then with PHP json_decode(), it will fail if it find "é", so for Firefox, Opera, Safari and Chrome, I've to call utf8_encode() before json_decode().

然后使用PHP json_decode(),如果找到“e”就会失败,因此对于Firefox、Opera、Safari和Chrome,我必须在json_decode()之前调用utf8_encode()。

Note : with my tests, IE and Firefox are using their native JSON object, others browsers are using json2.js.

注意:在我的测试中,IE和Firefox使用它们的本地JSON对象,而其他浏览器使用json2.js。

#3


11  

ASCII isn't in it any more. Using UTF-8 encoding means that you aren't using ASCII encoding. What you should use the escaping mechanism for is what the RFC says:

ASCII码不在里面了。使用UTF-8编码意味着您没有使用ASCII编码。应该使用转义机制的是RFC的说法:

All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)

除了必须转义的字符之外,所有Unicode字符都可以放在引号中:引号、反向固位和控制字符(U+0000到U+001F)

#4


5  

I was facing the same problem. It works for me. Please check this.

我也面临着同样的问题。它适合我。请检查这个。

json_encode($array,JSON_UNESCAPED_UNICODE);

#5


0  

I had a similar problem with é char... I think the comment "it's possible that the text you're feeding it isn't UTF-8" is probably close to the mark here. I have a feeling the default collation in my instance was something else until I realized and changed to utf8... problem is the data was already there, so not sure if it converted the data or not when i changed it, displays fine in mysql workbench. End result is that php will not json encode the data, just returns false. Doesn't matter what browser you use as its the server causing my issue, php will not parse the data to utf8 if this char is present. Like i say not sure if it is due to converting the schema to utf8 after data was present or just a php bug. In this case use json_encode(utf8_encode($string));

我也遇到过类似的问题。我认为,“你给它喂食的文字可能不是UTF-8”这句话很可能与这里的标记很接近。我有一种感觉,在我的实例中默认的排序规则是其他的东西,直到我意识到并更改为utf8…问题是数据已经在那里了,所以不确定当我修改数据时它是否转换了数据,在mysql workbench中显示得很好。最终结果是php不会对数据进行json编码,只是返回false。不管您使用什么浏览器作为服务器引起我的问题,如果存在此char, php都不会将数据解析为utf8。就像我说的,不确定是由于数据出现后将模式转换为utf8,还是只是php bug。在这种情况下,使用json_encode(utf8_encode($string));

#1


66  

All JSON parsers can handle proper UTF-8 just as well as the numeric escape sequences, as the JSON specification requires.

所有JSON解析器都可以处理适当的UTF-8和数字转义序列,这是JSON规范所要求的。

The ability for JSON encoders to use the numeric escape sequences instead just offers you more choice. One reason you may choose the numeric escape sequences would be if a transport mechanism in between your encoder and the intended decoder is not binary-safe.

JSON编码器使用数字转义序列的能力提供了更多的选择。选择数字转义序列的一个原因是,在编码器和预期解码器之间的传输机制不是二进制安全的。

Another reason you may numeric escape sequences is to prevent certain characters appearing in the stream, such as <, & and ", which may be interpreted as HTML sequences if the JSON code is placed without escaping into HTML or a browser wrongly interprets it as HTML. This can be a defence against HTML injection or cross-site scripting (note: some characters MUST be escaped in JSON, including " and \).

数字转义序列的另一个原因是防止出现在流中的某些字符,如<、&和",如果JSON代码没有转义到HTML中,或者浏览器错误地将其解释为HTML,那么这些字符可能被解释为HTML序列。这可以防止HTML注入或跨站点脚本(注意:有些字符必须以JSON形式转义,包括“和\”)。

Some frameworks, including PHP's implementation of JSON, always do the numeric escape sequences on the encoder side for any character outside of ASCII. This is intended for maximum compatibility with limited transport mechanisms and the like. However, this should not be interpreted as an indication that JSON decoders have a problem with UTF-8.

有些框架,包括PHP的JSON实现,总是在编码器端为ASCII之外的任何字符执行数字转义序列。这是为了最大限度地兼容有限的传输机制等。但是,这并不意味着JSON解码器有UTF-8问题。

So, I guess you just could decide which to use like this:

所以,我猜你可以这样决定用哪个:

  • Just use UTF-8, unless your method of storage or transport between encoder and decoder is not binary-safe.

    只要使用UTF-8,除非您在编码器和解码器之间的存储或传输方法不是二进制安全的。

  • Otherwise, use the numeric escape sequences.

    否则,使用数字转义序列。

#2


14  

I had a problem there. When I JSON encode a string with a character like "é", every browsers will return the same "é", except IE which will return "\u00e9".

我有个问题。当我用“e”这样的字符编码一个字符串时,每个浏览器都会返回相同的“e”,除了IE会返回“\u00e9”。

Then with PHP json_decode(), it will fail if it find "é", so for Firefox, Opera, Safari and Chrome, I've to call utf8_encode() before json_decode().

然后使用PHP json_decode(),如果找到“e”就会失败,因此对于Firefox、Opera、Safari和Chrome,我必须在json_decode()之前调用utf8_encode()。

Note : with my tests, IE and Firefox are using their native JSON object, others browsers are using json2.js.

注意:在我的测试中,IE和Firefox使用它们的本地JSON对象,而其他浏览器使用json2.js。

#3


11  

ASCII isn't in it any more. Using UTF-8 encoding means that you aren't using ASCII encoding. What you should use the escaping mechanism for is what the RFC says:

ASCII码不在里面了。使用UTF-8编码意味着您没有使用ASCII编码。应该使用转义机制的是RFC的说法:

All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)

除了必须转义的字符之外,所有Unicode字符都可以放在引号中:引号、反向固位和控制字符(U+0000到U+001F)

#4


5  

I was facing the same problem. It works for me. Please check this.

我也面临着同样的问题。它适合我。请检查这个。

json_encode($array,JSON_UNESCAPED_UNICODE);

#5


0  

I had a similar problem with é char... I think the comment "it's possible that the text you're feeding it isn't UTF-8" is probably close to the mark here. I have a feeling the default collation in my instance was something else until I realized and changed to utf8... problem is the data was already there, so not sure if it converted the data or not when i changed it, displays fine in mysql workbench. End result is that php will not json encode the data, just returns false. Doesn't matter what browser you use as its the server causing my issue, php will not parse the data to utf8 if this char is present. Like i say not sure if it is due to converting the schema to utf8 after data was present or just a php bug. In this case use json_encode(utf8_encode($string));

我也遇到过类似的问题。我认为,“你给它喂食的文字可能不是UTF-8”这句话很可能与这里的标记很接近。我有一种感觉,在我的实例中默认的排序规则是其他的东西,直到我意识到并更改为utf8…问题是数据已经在那里了,所以不确定当我修改数据时它是否转换了数据,在mysql workbench中显示得很好。最终结果是php不会对数据进行json编码,只是返回false。不管您使用什么浏览器作为服务器引起我的问题,如果存在此char, php都不会将数据解析为utf8。就像我说的,不确定是由于数据出现后将模式转换为utf8,还是只是php bug。在这种情况下,使用json_encode(utf8_encode($string));