I am trying to extract a word from a first line of file:
我试图从第一行文件中提取一个单词:
LOCATION,Feij�,AC,a,b,c
LOCATION,Feij,AC,A,B,C
this way:
这条路:
2.0.0-p247 :005 > File.foreach(file).first
2.0.0-p247:005> File.foreach(file).first
=> "LOCATION,Feij\xF3,AC,a,b,c\r\n"`
>“LOCATION,Feij \ xF3,AC,a,b,c \ r \ n”`
but when I try to use split:
但是当我尝试使用拆分时:
2.0.0-p247 :008 > File.foreach(file).first.split(",")
2.0.0-p247:008> File.foreach(file).first.split(“,”)
ArgumentError: invalid byte sequence in UTF-8 from (irb):8:in
split' from (irb):8 from /home/bleh/.rvm/rubies/ruby-2.0.0-p247/bin/irb:13:in
'ArgumentError:来自(irb)的UTF-8中的无效字节序列:8:来自(irb)的分割:来自/home/bleh/.rvm/rubies/ruby-2.0.0-p247/bin/irb:13的8:在'
What I expected is: Feijó
我的期望是:Feijó
I already try a lot of combinations like .encode and .force_encoding.
我已经尝试了很多组合,比如.encode和.force_encoding。
Some ideas?
一些想法?
1 个解决方案
#1
3
The character ó
is \xF3
in the ISO-8859-1 encoding, so this is probably the encoding of the file (it could also be CP-1252.
字符ó是ISO-8859-1编码中的\ xF3,因此这可能是文件的编码(也可能是CP-1252)。
You can specify the encoding as an arg to File::foreach
, and you can also ask Ruby to re-encode it to UTF-8 for you:
您可以将编码指定为File :: foreach的arg,也可以让Ruby将其重新编码为UTF-8:
File.foreach(file, :encoding => 'iso-8859-1:utf-8').first.split(",")
#1
3
The character ó
is \xF3
in the ISO-8859-1 encoding, so this is probably the encoding of the file (it could also be CP-1252.
字符ó是ISO-8859-1编码中的\ xF3,因此这可能是文件的编码(也可能是CP-1252)。
You can specify the encoding as an arg to File::foreach
, and you can also ask Ruby to re-encode it to UTF-8 for you:
您可以将编码指定为File :: foreach的arg,也可以让Ruby将其重新编码为UTF-8:
File.foreach(file, :encoding => 'iso-8859-1:utf-8').first.split(",")