什么“application / json内容类型:;charset = utf - 8”的真正意思是什么呢?

时间:2022-11-13 14:47:14

When I make a POST request with a JSON body to my REST service I include Content-type: application/json; charset=utf-8 in the message header. Without this header, I get an error from the service. I can also successfully use Content-type: application/json without the ;charset=utf-8 portion.

当我向REST服务发出JSON主体的POST请求时,我包含了Content-type: application/ JSON;消息头中的charset=utf-8。没有这个标题,我从服务中得到一个错误。我还可以成功地使用Content-type:应用程序/json而不使用;charset=utf-8部分。

What exactly does charset=utf-8 do ? I know it specifies the character encoding but the service works fine without it. Does this encoding limit the characters that can be in the message body?

charset=utf-8具体做什么?我知道它指定了字符编码,但是没有它服务可以正常工作。这种编码会限制消息体中的字符吗?

3 个解决方案

#1


209  

The header just denotes what the content is encoded in. It is not necessarily possible to deduce the type of the content from the content itself, i.e. you can't necessarily just look at the content and know what to do with it. That's what HTTP headers are for, they tell the recipient what kind of content they're (supposedly) dealing with.

header只是表示内容编码的内容。从内容本身推断出内容的类型是不可能的,也就是说,你不能只看内容就知道该怎么做。这就是HTTP报头的用途,它们告诉收件人他们(假定)要处理的内容类型。

Content-type: application/json; charset=utf-8 designates the content to be in JSON format, encoded in the UTF-8 character encoding. Designating the encoding is somewhat redundant for JSON, since the default (only?) encoding for JSON is UTF-8. So in this case the receiving server apparently is happy knowing that it's dealing with JSON and assumes that the encoding is UTF-8 by default, that's why it works with or without the header.

内容类型:application / json;charset=utf-8指定内容为JSON格式,编码为utf-8字符编码。指定编码对JSON来说有些多余,因为JSON的默认(仅?)编码是UTF-8。因此,在这种情况下,接收服务器显然很高兴知道它正在处理JSON,并假设默认的编码是UTF-8,这就是为什么它使用或不使用header的原因。

Does this encoding limit the characters that can be in the message body?

这种编码会限制消息体中的字符吗?

No. You can send anything you want in the header and the body. But, if the two don't match, you may get wrong results. If you specify in the header that the content is UTF-8 encoded but you're actually sending Latin1 encoded content, the receiver may produce garbage data, trying to interpret Latin1 encoded data as UTF-8. If of course you specify that you're sending Latin1 encoded data and you're actually doing so, then yes, you're limited to the 256 characters you can encode in Latin1.

不。您可以在标题和正文中发送任何您想要的内容。但是,如果两者不匹配,你可能会得到错误的结果。如果在header中指定内容是UTF-8编码的,但实际上您正在发送Latin1编码的内容,那么接收方可能会产生垃圾数据,试图将Latin1编码的数据解释为UTF-8。当然,如果您指定您正在发送Latin1编码的数据,并且您确实正在发送这些数据,那么是的,您将被限制为可以在Latin1中编码的256个字符。

#2


126  

To substantiate @deceze's claim that the default JSON encoding is UTF-8...

为了证实@ decize所说的默认JSON编码为UTF-8…

From IETF RFC4627:

IETF RFC4627:

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

JSON文本应该用Unicode编码。默认编码是UTF-8。

Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

由于JSON文本的前两个字符始终是ASCII字符[RFC0020],通过查看前四个八位中的nulls模式,可以确定八位元流是UTF-8、UTF-16 (be或LE)还是UTF-32 (be或LE)。

      00 00 00 xx  UTF-32BE
      00 xx 00 xx  UTF-16BE
      xx 00 00 00  UTF-32LE
      xx 00 xx 00  UTF-16LE
      xx xx xx xx  UTF-8

#3


15  

Note that IETF RFC4627 has been superseded by IETF RFC7158. In section [8.1] it retracts the text cited by @Drew earlier by saying:

注意,IETF RFC4627已被IETF RFC7158所取代。在[8.1]节中,它收回了@Drew早先引用的文本,他说:

Implementations MUST NOT add a byte order mark to the beginning of a JSON text.

#1


209  

The header just denotes what the content is encoded in. It is not necessarily possible to deduce the type of the content from the content itself, i.e. you can't necessarily just look at the content and know what to do with it. That's what HTTP headers are for, they tell the recipient what kind of content they're (supposedly) dealing with.

header只是表示内容编码的内容。从内容本身推断出内容的类型是不可能的,也就是说,你不能只看内容就知道该怎么做。这就是HTTP报头的用途,它们告诉收件人他们(假定)要处理的内容类型。

Content-type: application/json; charset=utf-8 designates the content to be in JSON format, encoded in the UTF-8 character encoding. Designating the encoding is somewhat redundant for JSON, since the default (only?) encoding for JSON is UTF-8. So in this case the receiving server apparently is happy knowing that it's dealing with JSON and assumes that the encoding is UTF-8 by default, that's why it works with or without the header.

内容类型:application / json;charset=utf-8指定内容为JSON格式,编码为utf-8字符编码。指定编码对JSON来说有些多余,因为JSON的默认(仅?)编码是UTF-8。因此,在这种情况下,接收服务器显然很高兴知道它正在处理JSON,并假设默认的编码是UTF-8,这就是为什么它使用或不使用header的原因。

Does this encoding limit the characters that can be in the message body?

这种编码会限制消息体中的字符吗?

No. You can send anything you want in the header and the body. But, if the two don't match, you may get wrong results. If you specify in the header that the content is UTF-8 encoded but you're actually sending Latin1 encoded content, the receiver may produce garbage data, trying to interpret Latin1 encoded data as UTF-8. If of course you specify that you're sending Latin1 encoded data and you're actually doing so, then yes, you're limited to the 256 characters you can encode in Latin1.

不。您可以在标题和正文中发送任何您想要的内容。但是,如果两者不匹配,你可能会得到错误的结果。如果在header中指定内容是UTF-8编码的,但实际上您正在发送Latin1编码的内容,那么接收方可能会产生垃圾数据,试图将Latin1编码的数据解释为UTF-8。当然,如果您指定您正在发送Latin1编码的数据,并且您确实正在发送这些数据,那么是的,您将被限制为可以在Latin1中编码的256个字符。

#2


126  

To substantiate @deceze's claim that the default JSON encoding is UTF-8...

为了证实@ decize所说的默认JSON编码为UTF-8…

From IETF RFC4627:

IETF RFC4627:

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

JSON文本应该用Unicode编码。默认编码是UTF-8。

Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

由于JSON文本的前两个字符始终是ASCII字符[RFC0020],通过查看前四个八位中的nulls模式,可以确定八位元流是UTF-8、UTF-16 (be或LE)还是UTF-32 (be或LE)。

      00 00 00 xx  UTF-32BE
      00 xx 00 xx  UTF-16BE
      xx 00 00 00  UTF-32LE
      xx 00 xx 00  UTF-16LE
      xx xx xx xx  UTF-8

#3


15  

Note that IETF RFC4627 has been superseded by IETF RFC7158. In section [8.1] it retracts the text cited by @Drew earlier by saying:

注意,IETF RFC4627已被IETF RFC7158所取代。在[8.1]节中,它收回了@Drew早先引用的文本,他说:

Implementations MUST NOT add a byte order mark to the beginning of a JSON text.