我可以使用Unicode来解码HTTP请求吗?

时间:2022-06-08 02:06:02

I understand that the default encoding of an HTTP Request is ISO 8859-1.

我知道HTTP请求的默认编码是ISO 8859-1。

Am I able to use Unicode to decode an HTTP request given as a byte array?

我可以使用Unicode来解码作为字节数组给出的HTTP请求吗?

If not, how would I decode such a request in C#?

如果没有,我将如何在C#中解码这样的请求?

EDIT: I'm developing a server, not a client.

编辑:我正在开发一个服务器,而不是一个客户端。

4 个解决方案

#1


As you said the default encoding of an HTTP POST request is ISO-8859-1. Otherwise you have to look at the Content-Type header that might then look like Content-Type: application/x-www-form-urlencoded; charset=UTF-8.

正如您所说,HTTP POST请求的默认编码是ISO-8859-1。否则,您必须查看Content-Type标题,该标题可能看起来像Content-Type:application / x-www-form-urlencoded;字符集= UTF-8。

Once you have read the posted data into a byte array you may decide to convert this buffer to a string (remember all strings in .NET are UTF-16). It is only at that moment that you need to know the encoding.

一旦将已发布的数据读入字节数组,就可以决定将此缓冲区转换为字符串(请记住.NET中的所有字符串都是UTF-16)。只有在那个时刻你需要知道编码。

byte[] buffer = ReadFromRequestStream(...)
string data = Encoding
              .GetEncoding("DETECTED ENCODING OR ISO-8859-1")
              .GetString(buffer);

And to answer your question:

并回答你的问题:

Am I able to use Unicode to decode an HTTP request given as a byte array?

我可以使用Unicode来解码作为字节数组给出的HTTP请求吗?

Yes, if unicode has been used to encode this byte array:

是的,如果已使用unicode对此字节数组进行编码:

string data = Encoding.UTF8.GetString(buffer);

#2


You don't use a unicode encoding to decode something that is not encoded using a unicode encoding, as that would not correctly decode all characters.

您不使用unicode编码来解码未使用unicode编码编码的内容,因为它无法正确解码所有字符。

Create an Encoding object for the correct encoding and use that:

为正确的编码创建一个Encoding对象,并使用:

Encoding iso = Encoding.GetEncoding("iso-8859-1");
string request = iso.GetString(requestArray);

#3



The code given below should help, if you are expecting large amount of data streaming down then doing it asynchronously is the best way to go about.

下面给出的代码应该有所帮助,如果你期望大量的数据流下来,那么异步执行它是最好的方法。

string myUrl = @"http://somedomain.com/file";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(myUrl);

//Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4;
request.MaximumResponseHeadersLength = 4;
request.Timeout = 15000;

response = (HttpWebResponse)request.GetResponse();                              

Stream receiveStream = response.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");

StreamReader readStream = new StreamReader(receiveStream, encode);

Char[] read = new Char[512];

// Reads 512 characters at a time.
int count = readStream.Read(read, 0, 512);

while (count > 0)
{
  // Dumps the 512 characters on a string and displays the string.
  String str = new String(read, 0, count);
  count = readStream.Read(read, 0, 512);
}

#4


Every time .NET transfers information between an external representation (e.g. a TCP socket) and the internal Unicode format (or the other way around), some form of encoding is involved.

每次.NET在外部表示(例如TCP套接字)和内部Unicode格式(或其他方式)之间传输信息时,都会涉及某种形式的编码。

See utf-8-vs-unicode, especially Jon Skeet's answer, with the reference to Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

请参阅utf-8-vs-unicode,特别是Jon Skeet的回答,参考Joel的文章绝对最低每个软件开发人员绝对必须知道Unicode和字符集(没有借口!)。

#1


As you said the default encoding of an HTTP POST request is ISO-8859-1. Otherwise you have to look at the Content-Type header that might then look like Content-Type: application/x-www-form-urlencoded; charset=UTF-8.

正如您所说,HTTP POST请求的默认编码是ISO-8859-1。否则,您必须查看Content-Type标题,该标题可能看起来像Content-Type:application / x-www-form-urlencoded;字符集= UTF-8。

Once you have read the posted data into a byte array you may decide to convert this buffer to a string (remember all strings in .NET are UTF-16). It is only at that moment that you need to know the encoding.

一旦将已发布的数据读入字节数组,就可以决定将此缓冲区转换为字符串(请记住.NET中的所有字符串都是UTF-16)。只有在那个时刻你需要知道编码。

byte[] buffer = ReadFromRequestStream(...)
string data = Encoding
              .GetEncoding("DETECTED ENCODING OR ISO-8859-1")
              .GetString(buffer);

And to answer your question:

并回答你的问题:

Am I able to use Unicode to decode an HTTP request given as a byte array?

我可以使用Unicode来解码作为字节数组给出的HTTP请求吗?

Yes, if unicode has been used to encode this byte array:

是的,如果已使用unicode对此字节数组进行编码:

string data = Encoding.UTF8.GetString(buffer);

#2


You don't use a unicode encoding to decode something that is not encoded using a unicode encoding, as that would not correctly decode all characters.

您不使用unicode编码来解码未使用unicode编码编码的内容,因为它无法正确解码所有字符。

Create an Encoding object for the correct encoding and use that:

为正确的编码创建一个Encoding对象,并使用:

Encoding iso = Encoding.GetEncoding("iso-8859-1");
string request = iso.GetString(requestArray);

#3



The code given below should help, if you are expecting large amount of data streaming down then doing it asynchronously is the best way to go about.

下面给出的代码应该有所帮助,如果你期望大量的数据流下来,那么异步执行它是最好的方法。

string myUrl = @"http://somedomain.com/file";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(myUrl);

//Set some reasonable limits on resources used by this request
request.MaximumAutomaticRedirections = 4;
request.MaximumResponseHeadersLength = 4;
request.Timeout = 15000;

response = (HttpWebResponse)request.GetResponse();                              

Stream receiveStream = response.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");

StreamReader readStream = new StreamReader(receiveStream, encode);

Char[] read = new Char[512];

// Reads 512 characters at a time.
int count = readStream.Read(read, 0, 512);

while (count > 0)
{
  // Dumps the 512 characters on a string and displays the string.
  String str = new String(read, 0, count);
  count = readStream.Read(read, 0, 512);
}

#4


Every time .NET transfers information between an external representation (e.g. a TCP socket) and the internal Unicode format (or the other way around), some form of encoding is involved.

每次.NET在外部表示(例如TCP套接字)和内部Unicode格式(或其他方式)之间传输信息时,都会涉及某种形式的编码。

See utf-8-vs-unicode, especially Jon Skeet's answer, with the reference to Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

请参阅utf-8-vs-unicode,特别是Jon Skeet的回答,参考Joel的文章绝对最低每个软件开发人员绝对必须知道Unicode和字符集(没有借口!)。