将转义字符串转换为bytearray或stream; C#

时间:2022-08-01 18:26:04

My input string consists of a mixture of unicode escape characters with regular characters mixed in. Example:

我的输入字符串由混合了常规字符的unicode转义字符组成。示例:

\u0000\u0003\u0000\u0013timestamp\u0011clientId\u0015timeToLive\u0017destination\u000fheaders\tbody\u0013messageId\u0001\u0006

How can I convert this into a bytearray or Stream?

如何将其转换为bytearray或Stream?

EDIT: UTF+8 encoding. To clarify the input string:

编辑:UTF + 8编码。澄清输入字符串:

Char 01: U+0000
Char 02: U+0003
Char 03: U+0000
Char 04: U+0013
Char 05: t
Char 06: i
Char 07: m
Char 08: e
Char 09: s
Char 10: t
Char 11: a
Char 12: m
Char 13: p
Char 14: U+0011
...
...    

2 个解决方案

#1


4  

Okay, so you've got an arbitrary string (the fact that it contains non-printable characters is irrelevant) and you want to convert it into a byte array using UTF-8. That's easy :)

好的,所以你有一个任意的字符串(事实上它包含不可打印的字符是无关紧要的),你想用UTF-8将它转换成一个字节数组。这很容易 :)

byte[] bytes = Encoding.UTF8.GetBytes(text);

Or to write to a stream, you'd normally wrap it in a StreamWriter:

或者要写入流,通常将其包装在StreamWriter中:

// Note that due to the using statement, this will close the stream at the end
// of the block
using (var writer = new StreamWriter(stream))
{
    writer.Write(text);
}

(UTF-8 is the default encoding for StreamWriter, but you can specify it explicitly of course.)

(UTF-8是StreamWriter的默认编码,但您当然可以明确指定它。)

I'm assuming you really have a good reason to have "text" in this form though. I can't say I've ever found a use for U+0003 (END OF TEXT). If, as I4V has suggested, this data was originally in a binary stream, you should avoid handling it as text in the first place. Separate out your binary data from your text data - when you mix them, it will cause issues. (For example, if the fourth character in your string were U+00FF, it would end up as two bytes when encoded to UTF-8, which probably wouldn't be what you wanted.)

我假设你真的有充分的理由在这种形式下使用“文本”。我不能说我曾经找到过用于U + 0003(结束文本)的用法。如果,正如I4V所建议的那样,这些数据最初是在二进制流中,那么首先应避免将其作为文本处理。从文本数据中分离出二进制数据 - 当它们混合时,会导致问题。 (例如,如果你的字符串中的第四个字符是U + 00FF,那么当编码为UTF-8时它最终会变为两个字节,这可能不是你想要的。)

#2


0  

To simplify the conversion just do this:

要简化转换,请执行以下操作:

var stream = new memoryStream(Encoding.UTF8.GetBytes(str));

Or if you want a approach that have concerns about reusability, create a Extension Method to strings like this:

或者,如果您想要一种关注可重用性的方法,请为这样的字符串创建一个扩展方法:

public static class StringExtension
{
     public static Stream ToStream(this string str)
       =>new memoryStream(Encoding.UTF8.GetBytes(str))         

     //Or much better
     public static Stream ToStreamWithEncoding(this string str, Encoding encoding)
       =>new memoryStream(encoding.GetBytes(str))
}

#1


4  

Okay, so you've got an arbitrary string (the fact that it contains non-printable characters is irrelevant) and you want to convert it into a byte array using UTF-8. That's easy :)

好的,所以你有一个任意的字符串(事实上它包含不可打印的字符是无关紧要的),你想用UTF-8将它转换成一个字节数组。这很容易 :)

byte[] bytes = Encoding.UTF8.GetBytes(text);

Or to write to a stream, you'd normally wrap it in a StreamWriter:

或者要写入流,通常将其包装在StreamWriter中:

// Note that due to the using statement, this will close the stream at the end
// of the block
using (var writer = new StreamWriter(stream))
{
    writer.Write(text);
}

(UTF-8 is the default encoding for StreamWriter, but you can specify it explicitly of course.)

(UTF-8是StreamWriter的默认编码,但您当然可以明确指定它。)

I'm assuming you really have a good reason to have "text" in this form though. I can't say I've ever found a use for U+0003 (END OF TEXT). If, as I4V has suggested, this data was originally in a binary stream, you should avoid handling it as text in the first place. Separate out your binary data from your text data - when you mix them, it will cause issues. (For example, if the fourth character in your string were U+00FF, it would end up as two bytes when encoded to UTF-8, which probably wouldn't be what you wanted.)

我假设你真的有充分的理由在这种形式下使用“文本”。我不能说我曾经找到过用于U + 0003(结束文本)的用法。如果,正如I4V所建议的那样,这些数据最初是在二进制流中,那么首先应避免将其作为文本处理。从文本数据中分离出二进制数据 - 当它们混合时,会导致问题。 (例如,如果你的字符串中的第四个字符是U + 00FF,那么当编码为UTF-8时它最终会变为两个字节,这可能不是你想要的。)

#2


0  

To simplify the conversion just do this:

要简化转换,请执行以下操作:

var stream = new memoryStream(Encoding.UTF8.GetBytes(str));

Or if you want a approach that have concerns about reusability, create a Extension Method to strings like this:

或者,如果您想要一种关注可重用性的方法,请为这样的字符串创建一个扩展方法:

public static class StringExtension
{
     public static Stream ToStream(this string str)
       =>new memoryStream(Encoding.UTF8.GetBytes(str))         

     //Or much better
     public static Stream ToStreamWithEncoding(this string str, Encoding encoding)
       =>new memoryStream(encoding.GetBytes(str))
}