如何在c#中将不可读的字符串转换回UTF-8字节

时间:2021-01-21 20:13:10

I have a string looks like aeroport aimé

我有一个字符串看起来像aeroport目标├⌐

I know it is French, and I want to convert this string back to readable format. Any suggestions?

我知道它是法语,我想将此字符串转换回可读格式。有什么建议?

3 个解决方案

#1


Heh. It's simple cryptanalysis task. You should collect statistics of letter usage in your string. It can be by single letter, two- or better tree-letter groups. Than you should collect the same statistics on big amount of text of same thematic. Then you should arrange tree-gramms of Franch and your fancy text by usage and decode your cryptogram. Of course it'll be wrong at first, but than you can apply dictionary to determine failure ratio and apply some kind of genetics algorithm to find best mach.

嘿。这是简单的密码分析任务。您应该收集字符串中字母用法的统计信息。它可以是单个字母,两个或更好的树字母组。比你应该收集相同主题的大量文本相同的统计数据。然后你应该按照用法安排Franch的树形语法和你喜欢的文本并解码你的密码。当然一开始就错了,但是你可以应用字典来确定失败率并应用某种遗传算法来找到最佳马赫。

And by the way. If originally text was UTF-8, but was 'forced' to be an one byte code page text, you should operate in bytes - not in symbols.

顺便说一下。如果最初的文本是UTF-8,但被“强制”为单字节代码页文本,则应以字节为单位操作 - 而不是以符号形式操作。

#2


That is not French, the French word for "airport" is "aéroport".

那不是法国人,法语中“机场”这个词是“aéroport”。

If you want to convert the string to a readable format, you have to know what encoding the original string was in, not what language. "aeroport aimé" is a legal UTF8 string.

如果要将字符串转换为可读格式,则必须知道原始字符串的编码,而不是语言。 “aeroportgets├⌐”是合法的UTF8字符串。

Where are you seeing this string? On a Windows command prompt? That shows funny characters like "├⌐" for high-ASCII characters. The command prompt uses CP437, not UTF8, if you have the UTF8 string "aimé" it will display as "aim├⌐" in CP437.

你在哪里看到这个字符串?在Windows命令提示符下?对于高ASCII字符,显示有趣的字符,如“├⌐”。命令提示符使用CP437,而不是UTF8,如果您有UTF8字符串“aimé”,它将在CP437中显示为“aim├⌐”。

If that is your situation, try writing the string to a file and opening the file in Notepad. If that looks right your string is correct, the application displaying it is wrong.

如果这是您的情况,请尝试将字符串写入文件并在记事本中打开该文件。如果看起来正确你的字符串是正确的,显示它的应用程序是错误的。

#3


This helped me in a similar case:

这帮助了我类似的情况:

string ok_string = System.Text.Encoding.UTF8.GetString(
System.Text.Encoding.Default.GetBytes(bad_string));

#1


Heh. It's simple cryptanalysis task. You should collect statistics of letter usage in your string. It can be by single letter, two- or better tree-letter groups. Than you should collect the same statistics on big amount of text of same thematic. Then you should arrange tree-gramms of Franch and your fancy text by usage and decode your cryptogram. Of course it'll be wrong at first, but than you can apply dictionary to determine failure ratio and apply some kind of genetics algorithm to find best mach.

嘿。这是简单的密码分析任务。您应该收集字符串中字母用法的统计信息。它可以是单个字母,两个或更好的树字母组。比你应该收集相同主题的大量文本相同的统计数据。然后你应该按照用法安排Franch的树形语法和你喜欢的文本并解码你的密码。当然一开始就错了,但是你可以应用字典来确定失败率并应用某种遗传算法来找到最佳马赫。

And by the way. If originally text was UTF-8, but was 'forced' to be an one byte code page text, you should operate in bytes - not in symbols.

顺便说一下。如果最初的文本是UTF-8,但被“强制”为单字节代码页文本,则应以字节为单位操作 - 而不是以符号形式操作。

#2


That is not French, the French word for "airport" is "aéroport".

那不是法国人,法语中“机场”这个词是“aéroport”。

If you want to convert the string to a readable format, you have to know what encoding the original string was in, not what language. "aeroport aimé" is a legal UTF8 string.

如果要将字符串转换为可读格式,则必须知道原始字符串的编码,而不是语言。 “aeroportgets├⌐”是合法的UTF8字符串。

Where are you seeing this string? On a Windows command prompt? That shows funny characters like "├⌐" for high-ASCII characters. The command prompt uses CP437, not UTF8, if you have the UTF8 string "aimé" it will display as "aim├⌐" in CP437.

你在哪里看到这个字符串?在Windows命令提示符下?对于高ASCII字符,显示有趣的字符,如“├⌐”。命令提示符使用CP437,而不是UTF8,如果您有UTF8字符串“aimé”,它将在CP437中显示为“aim├⌐”。

If that is your situation, try writing the string to a file and opening the file in Notepad. If that looks right your string is correct, the application displaying it is wrong.

如果这是您的情况,请尝试将字符串写入文件并在记事本中打开该文件。如果看起来正确你的字符串是正确的,显示它的应用程序是错误的。

#3


This helped me in a similar case:

这帮助了我类似的情况:

string ok_string = System.Text.Encoding.UTF8.GetString(
System.Text.Encoding.Default.GetBytes(bad_string));