
时间:2021-01-21 20:13:10

I have a string looks like aeroport aimé


I know it is French, and I want to convert this string back to readable format. Any suggestions?


3 个解决方案


Heh. It's simple cryptanalysis task. You should collect statistics of letter usage in your string. It can be by single letter, two- or better tree-letter groups. Than you should collect the same statistics on big amount of text of same thematic. Then you should arrange tree-gramms of Franch and your fancy text by usage and decode your cryptogram. Of course it'll be wrong at first, but than you can apply dictionary to determine failure ratio and apply some kind of genetics algorithm to find best mach.


And by the way. If originally text was UTF-8, but was 'forced' to be an one byte code page text, you should operate in bytes - not in symbols.

顺便说一下。如果最初的文本是UTF-8,但被“强制”为单字节代码页文本,则应以字节为单位操作 - 而不是以符号形式操作。


That is not French, the French word for "airport" is "aéroport".


If you want to convert the string to a readable format, you have to know what encoding the original string was in, not what language. "aeroport aimé" is a legal UTF8 string.

如果要将字符串转换为可读格式,则必须知道原始字符串的编码,而不是语言。 “aeroportgets├⌐”是合法的UTF8字符串。

Where are you seeing this string? On a Windows command prompt? That shows funny characters like "├⌐" for high-ASCII characters. The command prompt uses CP437, not UTF8, if you have the UTF8 string "aimé" it will display as "aim├⌐" in CP437.


If that is your situation, try writing the string to a file and opening the file in Notepad. If that looks right your string is correct, the application displaying it is wrong.



This helped me in a similar case:


string ok_string = System.Text.Encoding.UTF8.GetString(


Heh. It's simple cryptanalysis task. You should collect statistics of letter usage in your string. It can be by single letter, two- or better tree-letter groups. Than you should collect the same statistics on big amount of text of same thematic. Then you should arrange tree-gramms of Franch and your fancy text by usage and decode your cryptogram. Of course it'll be wrong at first, but than you can apply dictionary to determine failure ratio and apply some kind of genetics algorithm to find best mach.


And by the way. If originally text was UTF-8, but was 'forced' to be an one byte code page text, you should operate in bytes - not in symbols.

顺便说一下。如果最初的文本是UTF-8,但被“强制”为单字节代码页文本,则应以字节为单位操作 - 而不是以符号形式操作。


That is not French, the French word for "airport" is "aéroport".


If you want to convert the string to a readable format, you have to know what encoding the original string was in, not what language. "aeroport aimé" is a legal UTF8 string.

如果要将字符串转换为可读格式,则必须知道原始字符串的编码,而不是语言。 “aeroportgets├⌐”是合法的UTF8字符串。

Where are you seeing this string? On a Windows command prompt? That shows funny characters like "├⌐" for high-ASCII characters. The command prompt uses CP437, not UTF8, if you have the UTF8 string "aimé" it will display as "aim├⌐" in CP437.


If that is your situation, try writing the string to a file and opening the file in Notepad. If that looks right your string is correct, the application displaying it is wrong.



This helped me in a similar case:


string ok_string = System.Text.Encoding.UTF8.GetString(