I have a .csv file encoded in UTF-8, which contains both latin and cyrillic symbols.
我有一个以UTF-8编码的.csv文件,它包含拉丁语和西里尔语符号。
;F1;F2;abcdefg3;F200
;ABSOLUTE;NOMINAL;NOMINAL;NOMINAL
o1;1;USA;Новосибирск;1223
I'm trying to execute following script in IronPython 2.7.1:
我正在尝试在IronPython 2.7.1中执行以下脚本:
import codecs
f = codecs.open(r"file.csv", "rb", "utf-8")
f.next()
During the execution of f.next() an exception occurs:
在执行f.next()期间,会发生异常:
Traceback (most recent call last):
File "c:\Program Files\Microsoft Visual Studio 10.0\Common7\IDE\Extensions\Microsoft\Python Tools for Visual Studio\1.1\visualstudio_py_repl.py", line 492, in run_file_as_main
code.Execute(self.exec_mod)
File "<string>", line 4, in <module>
File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 684, in next
return self.reader.next()
File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 615, in next
line = self.readline()
File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 530, in readline
data = self.read(readsize, firstline=True)
File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 477, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')
At the same time in CPython 2.7 the script works correctly. Also in the IronPython 2.7.1 following script works fine:
同时在CPython 2.7中,脚本可以正常工作。同样在IronPython 2.7.1下面的脚本工作正常:
import codecs
f = codecs.open(r"file.csv", "rb", "utf-8")
f.readlines()
Does anybody know what may cause such strange behaviour?
有谁知道可能导致这种奇怪行为的原因是什么?
2 个解决方案
#1
2
Looks like it could be a bug in how next()
handles codecs. Can you please open an issue with the files to reproduce attached?
看起来它可能是next()处理编解码器的错误。你可以打开一个问题来重现附加的文件吗?
#2
0
May be trouble with "rb" parameter, try to use 'r'
可能是“rb”参数有问题,尝试使用'r'
f = codecs.open(r"file.csv", "r", "utf-8")
#1
2
Looks like it could be a bug in how next()
handles codecs. Can you please open an issue with the files to reproduce attached?
看起来它可能是next()处理编解码器的错误。你可以打开一个问题来重现附加的文件吗?
#2
0
May be trouble with "rb" parameter, try to use 'r'
可能是“rb”参数有问题,尝试使用'r'
f = codecs.open(r"file.csv", "r", "utf-8")