I have the following code, which gives me an invalid byte sequence error pointing to the scan method in initialize
. Any ideas on how to fix this? For what it's worth, the error does not occur when the (.*)
between the h1 tag and the closing >
is not there.
我有以下代码,它给我一个无效的字节序列错误指向初始化中的扫描方法。有想法该怎么解决这个吗?对于它的价值,当h1标签和关闭>之间的(。*)不存在时,不会发生错误。
#!/usr/bin/env ruby
class NewsParser
def initialize
Dir.glob("./**/index.htm") do |file|
@file = IO.read file
parsed = @file.scan(/<h1(.*)>(.*?)<\/h1>(.*)<!-- InstanceEndEditable -->/im)
self.write(parsed)
end
end
def write output
@contents = output
open('output.txt', 'a') do |f|
f << @contents[0][0]+"\n\n"+@contents[0][1]+"\n\n\n\n"
end
end
end
p = NewsParser.new
Edit: Here is the error message:
编辑:这是错误消息:
news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)
news_parser.rb:10:'scan':UTF-8中无效的字节序列(ArgumentError)
SOLVED: The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)
and encoding: UTF-8
solve the issue.
求助:使用的组合:@file = IO.read(文件).force_encoding(“ISO-8859-1”)。encode(“utf-8”,替换:nil)和编码:UTF-8解决问题。
Thanks!
谢谢!
1 个解决方案
#1
34
The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)
and #encoding: UTF-8
solved the issue.
结合使用:@file = IO.read(file).force_encoding(“ISO-8859-1”)。encode(“utf-8”,replace:nil)和#encoding:UTF-8解决了这个问题。
#1
34
The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)
and #encoding: UTF-8
solved the issue.
结合使用:@file = IO.read(file).force_encoding(“ISO-8859-1”)。encode(“utf-8”,replace:nil)和#encoding:UTF-8解决了这个问题。