I have to load some data from external sources. When I look at the encoding, Ruby tells me ASCII-8BIT
, binary file. However, some of the sources are encoded ISO-8859-1
and some of them are in UTF-8
. When I try to convert the ISO-8859-1
encoded stuff to UTF-8
, I get an error. But when I do something like content.force_encoding('ISO-8859-1').encode('UTF-8')
everything works fine.
我必须从外部源加载一些数据。当我查看编码时,Ruby告诉我ASCII-8BIT,二进制文件。但是,有些源代码编码为ISO-8859-1,其中一些源代码为UTF-8。当我尝试将ISO-8859-1编码的东西转换为UTF-8时,我收到一个错误。但是,当我执行content.force_encoding('ISO-8859-1')。encode('UTF-8')之类的操作时,一切正常。
However, this doesn't work the other way round. When I try to encode the UTF-8 data to ISO, it ends up with broken characters like 
.
但是,这并不相反。当我尝试将UTF-8数据编码为ISO时,它最终会出现像这样的破碎字符。
So, is there a way to detect the "underlying" encoding of the
ASCII-8BIT
data, and then convert it toUTF-8
?那么,有没有办法检测ASCII-8BIT数据的“底层”编码,然后将其转换为UTF-8?
1 个解决方案
#1
I had a quick google and found the Charlock Holmes gem by Brian Lopez. It looks like it does the detection process you're after.
我有一个快速的谷歌,发现了Brian Lopez的Charlock Holmes宝石。它看起来像你正在进行的检测过程。
#1
I had a quick google and found the Charlock Holmes gem by Brian Lopez. It looks like it does the detection process you're after.
我有一个快速的谷歌,发现了Brian Lopez的Charlock Holmes宝石。它看起来像你正在进行的检测过程。