如何从unicode转换为ASCII ?

时间:2021-01-08 20:14:44

Is there any way to convert unicode values to ASCII?

是否有办法将unicode值转换为ASCII?

5 个解决方案

#1


6  

To simply strip the accents from unicode characters you can use something like:

为了简单地去掉unicode字符的重音,您可以使用以下方法:

string.Concat(input.Normalize(NormalizationForm.FormD).Where(
  c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));

#2


3  

Technically, yes you can by using Encoding.ASCII.

从技术上讲,你可以使用Encoding.ASCII。

Example (from byte[] to ASCII):

示例(从字节[]到ASCII):

// Convert Unicode to Bytes

byte[] uni = Encoding.Unicode.GetBytes("Whatever unicode string you have");

// Convert to ASCII

string Ascii = Encoding.ASCII.GetString(uni);

Just remember Unicode a much larger standard than Ascii and there will be characters that simply cannot be correctly encoded. Have a look here for tables and a little more information on the two encodings.

只要记住Unicode比Ascii大得多的标准,就会出现无法正确编码的字符。请看这里的表格和关于这两个编码的更多信息。

#3


3  

This workaround might better suit your needs. It strips the unicode chars from a string and only keeps the ASCII chars.

这种变通方法可能更适合你的需要。它从字符串中去掉unicode字符,只保留ASCII字符。

byte[] bytes = Encoding.ASCII.GetBytes("eéêëèiïaâäàåcç  test");
char[] chars = Encoding.ASCII.GetChars(bytes);
string line = new String(chars);
line = line.Replace("?", "");
//Results in "eiac test"

Please note that the 2nd "space" in the character input string is the char with ASCII value 255

请注意,字符输入字符串中的第二个“空格”是使用ASCII值255的字符。

#4


1  

Well, seeing as how there's some 100,000+ unicode characters and only 128 ASCII characters, a 1-1 mapping is obviously impossible.

好吧,既然有10万多个unicode字符和128个ASCII字符,那么1-1映射显然是不可能的。

You can use the Encoding.ASCII object to get the ASCII byte values from a Unicode string, though.

您可以使用编码。不过,ASCII对象从Unicode字符串获取ASCII字节值。

#5


1  

You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have. Almost the only thing you can do that is even close to the right thing is to discard all characters above codepoint 128, and even that is very likely nowhere near what your requirements say. (The other possibility is to simplify accented or umlauted letters to make more than 128 characters 'nearly' expressible, but that still doesn't even begin to actually cover Unicode.)

不能从Unicode转换为ASCII。几乎Unicode中的每个字符都不能用ASCII表示,而可以表示的字符在ASCII中与UTF-8中的代码点完全相同,这可能就是您所拥有的。几乎你唯一能做的就是把所有的字符都抛弃在codepoint 128上,即使这样也很可能离你的要求不远了。(另一种可能是,简化重音或umlauted的字母,使128个字符几乎可以表达,但它甚至还没有开始真正覆盖Unicode。)

#1


6  

To simply strip the accents from unicode characters you can use something like:

为了简单地去掉unicode字符的重音,您可以使用以下方法:

string.Concat(input.Normalize(NormalizationForm.FormD).Where(
  c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));

#2


3  

Technically, yes you can by using Encoding.ASCII.

从技术上讲,你可以使用Encoding.ASCII。

Example (from byte[] to ASCII):

示例(从字节[]到ASCII):

// Convert Unicode to Bytes

byte[] uni = Encoding.Unicode.GetBytes("Whatever unicode string you have");

// Convert to ASCII

string Ascii = Encoding.ASCII.GetString(uni);

Just remember Unicode a much larger standard than Ascii and there will be characters that simply cannot be correctly encoded. Have a look here for tables and a little more information on the two encodings.

只要记住Unicode比Ascii大得多的标准,就会出现无法正确编码的字符。请看这里的表格和关于这两个编码的更多信息。

#3


3  

This workaround might better suit your needs. It strips the unicode chars from a string and only keeps the ASCII chars.

这种变通方法可能更适合你的需要。它从字符串中去掉unicode字符,只保留ASCII字符。

byte[] bytes = Encoding.ASCII.GetBytes("eéêëèiïaâäàåcç  test");
char[] chars = Encoding.ASCII.GetChars(bytes);
string line = new String(chars);
line = line.Replace("?", "");
//Results in "eiac test"

Please note that the 2nd "space" in the character input string is the char with ASCII value 255

请注意,字符输入字符串中的第二个“空格”是使用ASCII值255的字符。

#4


1  

Well, seeing as how there's some 100,000+ unicode characters and only 128 ASCII characters, a 1-1 mapping is obviously impossible.

好吧,既然有10万多个unicode字符和128个ASCII字符,那么1-1映射显然是不可能的。

You can use the Encoding.ASCII object to get the ASCII byte values from a Unicode string, though.

您可以使用编码。不过,ASCII对象从Unicode字符串获取ASCII字节值。

#5


1  

You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have. Almost the only thing you can do that is even close to the right thing is to discard all characters above codepoint 128, and even that is very likely nowhere near what your requirements say. (The other possibility is to simplify accented or umlauted letters to make more than 128 characters 'nearly' expressible, but that still doesn't even begin to actually cover Unicode.)

不能从Unicode转换为ASCII。几乎Unicode中的每个字符都不能用ASCII表示,而可以表示的字符在ASCII中与UTF-8中的代码点完全相同,这可能就是您所拥有的。几乎你唯一能做的就是把所有的字符都抛弃在codepoint 128上,即使这样也很可能离你的要求不远了。(另一种可能是,简化重音或umlauted的字母,使128个字符几乎可以表达,但它甚至还没有开始真正覆盖Unicode。)