I got a pickled object (a list with a few numpy arrays in it) that was created on Windows and apparently saved to a file loaded as text, not in binary mode (ie. with open(filename, 'w')
instead of open(filename, 'wb')
). Result is that now I can't unpickle it (not even on Windows) because it's infected with \r
characters (and possibly more)? The main complaint is
我得到了一个pickle对象(其中包含几个numpy数组),它是在Windows上创建的,显然是保存到作为文本加载的文件中,而不是二进制模式(ie)。打开(文件名,'w')而不是打开(文件名,'wb')。结果是,现在我不能把它(甚至在Windows上)去掉,因为它已经感染了\r字符(甚至可能更多)?主要的抱怨是
ImportError: No module named multiarray
supposedly because it's looking for numpy.core.multiarray\r
, which of course doesn't exist. Simply removing the \r
characters didn't do the trick (tried both sed -e 's/\r//g'
and, in python s = file.read().replace('\r', '')
, but both break the file and yield a cPickle.UnpicklingError
later on)
应该是因为它在寻找numpy.core。multiarray\r,当然不存在。简单地删除\r字符并没有成功(尝试了sed -e 's/\r//g'和,在python = file.read()中。替换('\r', "),但都打破文件并产生一个cPickle。UnpicklingError之后)
Problem is that I really need to get the data out of the objects. Any ideas how to fix the files?
问题是我确实需要从对象中获取数据。有什么办法解决这些文件吗?
Edit: On request, the first few hundred bytes of my file, Octal:
编辑:根据请求,我的文件的前几百个字节,八进制:
\x80\x02]q\x01(}q\x02(U\r\ntotal_timeq\x03G?\x90\x15r\xc9(s\x00U\rreaction_timeq\x04NU\x0ejump_directionq\x05cnumpy.core.multiarray\r\nscalar\r\nq\x06cnumpy\r\ndtype\r\nq\x07U\x02f8K\x00K\x01\x87Rq\x08(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\x025\x9d\x13\xfc#\xc8?\x86Rq\tU\x14normalised_directionq\r\nh\x06h\x08U\x08\xf0\xf9,\x0eA\x18\xf8?\x86Rq\x0bU\rjump_distanceq\x0ch\x06h\x08U\x08\x13\x14\xea&\xb0\x9b\x1a@\x86Rq\rU\x04jumpq\x0ecnumpy.core.multiarray\r\n_reconstruct\r\nq\x0fcnumpy\r\nndarray\r\nq\x10K\x00\x85U\x01b\x87Rq\x11(K\x01K\x02\x85h\x08\x89U\x10\x87\x16\xdaEG\xf4\xf3?\x06`OC\xe7"\x1a@tbU\x0emovement_speedq\x12h\x06h\x08U\x08\\p\xf5[2\xc2\xef?\x86Rq\x13U\x0ctrial_lengthq\x14G@\t\x98\x87\xf8\x1a\xb4\xbaU\tconditionq\x15U\x0bhigh_mentalq\x16U\x07subjectq\x17K\x02U\x12movement_directionq\x18h\x06h\x08U\x08\xde\x06\xcf\x1c50\xfd?\x86Rq\x19U\x08positionq\x1ah\x0fh\x10K\x00\x85U\x01b\x87Rq\x1b(K\x01K\x02\x85h\x08\x89U\x10K\xb7\xb4\x07q=\x1e\xc0\xf2\xc2YI\xb7U&\xc0tbU\x04typeq\x1ch\x0eU\x08movementq\x1dh\x0fh\x10K\x00\x85U\x01b\x87Rq\x1e(K\x01K\x02\x85h\x08\x89U\x10\xad8\x9c9\x10\xb5\xee\xbf\xffa\xa2hWR\xcf?tbu}q\x1f(h\x03G@\t\xba\xbc\xb8\xad\xc8\x14h\x04G?\xd9\x99%]\xadV\x00h\x05h\x06h\x08U\x08\xe3X\xa9=\xc1\xb1\xeb?\x86Rq h\r\nh\x06h\x08U\x08\x88\xf7\xb9\xc1\t\xd6\xff?\x86Rq!h\x0ch\x06h\x08U\x08v\x7f\xeb\x11\xea5\r@\x86Rq"h\x0eh\x0fh\x10K\x00\x85U\x01b\x87Rq#(K\x01K\x02\x85h\x08\x89U\x10\xcd\xd9\x92\x9a\x94=\x06@]C\xaf\xef\xeb\xef\x02@tbh\x12h\x06h\x08U\x08-\x9c&\x185\xfd\xef?\x86Rq$h\x14G@\r\xb8W\xb2`V\xach\x15h\x16h\x17K\x02h\x18h\x06h\x08U\x08\x8e\x87\xd1\xc2
You may also download the whole file (22k).
您也可以下载整个文件(22k)。
4 个解决方案
#1
11
Presuming that the file was created with the default protocol=0 ASCII-compatible method, you should be able to load it anywhere by using open('pickled_file', 'rU')
i.e. universal newlines.
假设该文件是使用默认协议=0 ascii兼容的方法创建的,您应该可以使用open('pickled_file', 'rU'),即通用的新行来加载它。
If this doesn't work, show us the first few hundred bytes: print repr(open('pickled_file', 'rb').read(200))
and paste the results into an edit of your question.
如果这不行,请向我们展示前几百个字节:print repr(open('pickled_file', 'rb').read(200)),并将结果粘贴到您的问题的编辑器中。
Update after file contents were published:
文件内容发布后更新:
Your file starts with '\x80\x02'
; it was dumped with protocol 2, the latest/best. Protocols 1 and 2 are binary protocols. Your file was written in text mode on Windows. This has resulted in each '\n'
being converted to '\r\n'
by the C runtime. Files should be opened in binary mode like this:
您的文件以“\x80\x02”开头;它被丢弃在协议2,最新的/最好的。协议1和2是二进制协议。你的文件是用文本模式写在Windows上的。这导致了每个“\n”被C运行时转换为“\r\n”。文件应该以二进制方式打开:
with open('result.pickle', 'wb') as f: # b for binary
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
with open('result.pickle', 'rb') as f: # b for binary
obj = pickle.load(f)
Docs are here. This code will work portably on both Windows and non-Windows systems.
文档都在这里。这段代码在Windows和非Windows系统上都可以很好地运行。
You can recover the original pickle image by reading the file in binary mode and then reversing the damage by replacing all occurrences of '\r\n'
by '\n'
. Note: This recovery procedure is necessary whether you are trying to read it on Windows or not.
您可以通过读取二进制模式中的文件来恢复原始的pickle图像,然后通过“\n”替换所有发生的“\r\n”来逆转损坏。注意:无论您是否尝试在Windows上阅读,这个恢复过程都是必需的。
#2
5
Newlines in Windows aren't just '\r'
, it's CRLF, or '\r\n'
.
Windows中的新行不仅仅是“\r”,它是CRLF,或“\r\n”。
Give file.read().replace('\r\n', '\n')
a try. You were previously deleting carriage returns that may not have actually been part of newlines.
给以()。替换(' \ r \ n ',' \ n ')一试。您之前删除了可能不属于换行的回车。
#3
0
Can't you -- on Windows -- just open the file in text mode, the same way it was written, read it in and then write it out to another file opened properly in binary mode?
你不能——在Windows上——打开文本模式的文件,就像它写的一样,读取它,然后把它写入另一个以二进制模式打开的文件?
#4
0
Have you tried unpickling in text mode? That is,
你试过在文本模式下的unpickle吗?也就是说,
x = pickle.load(open(filename, 'r'))
(On Windows, of course.)
(当然,在Windows上)。
#1
11
Presuming that the file was created with the default protocol=0 ASCII-compatible method, you should be able to load it anywhere by using open('pickled_file', 'rU')
i.e. universal newlines.
假设该文件是使用默认协议=0 ascii兼容的方法创建的,您应该可以使用open('pickled_file', 'rU'),即通用的新行来加载它。
If this doesn't work, show us the first few hundred bytes: print repr(open('pickled_file', 'rb').read(200))
and paste the results into an edit of your question.
如果这不行,请向我们展示前几百个字节:print repr(open('pickled_file', 'rb').read(200)),并将结果粘贴到您的问题的编辑器中。
Update after file contents were published:
文件内容发布后更新:
Your file starts with '\x80\x02'
; it was dumped with protocol 2, the latest/best. Protocols 1 and 2 are binary protocols. Your file was written in text mode on Windows. This has resulted in each '\n'
being converted to '\r\n'
by the C runtime. Files should be opened in binary mode like this:
您的文件以“\x80\x02”开头;它被丢弃在协议2,最新的/最好的。协议1和2是二进制协议。你的文件是用文本模式写在Windows上的。这导致了每个“\n”被C运行时转换为“\r\n”。文件应该以二进制方式打开:
with open('result.pickle', 'wb') as f: # b for binary
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
with open('result.pickle', 'rb') as f: # b for binary
obj = pickle.load(f)
Docs are here. This code will work portably on both Windows and non-Windows systems.
文档都在这里。这段代码在Windows和非Windows系统上都可以很好地运行。
You can recover the original pickle image by reading the file in binary mode and then reversing the damage by replacing all occurrences of '\r\n'
by '\n'
. Note: This recovery procedure is necessary whether you are trying to read it on Windows or not.
您可以通过读取二进制模式中的文件来恢复原始的pickle图像,然后通过“\n”替换所有发生的“\r\n”来逆转损坏。注意:无论您是否尝试在Windows上阅读,这个恢复过程都是必需的。
#2
5
Newlines in Windows aren't just '\r'
, it's CRLF, or '\r\n'
.
Windows中的新行不仅仅是“\r”,它是CRLF,或“\r\n”。
Give file.read().replace('\r\n', '\n')
a try. You were previously deleting carriage returns that may not have actually been part of newlines.
给以()。替换(' \ r \ n ',' \ n ')一试。您之前删除了可能不属于换行的回车。
#3
0
Can't you -- on Windows -- just open the file in text mode, the same way it was written, read it in and then write it out to another file opened properly in binary mode?
你不能——在Windows上——打开文本模式的文件,就像它写的一样,读取它,然后把它写入另一个以二进制模式打开的文件?
#4
0
Have you tried unpickling in text mode? That is,
你试过在文本模式下的unpickle吗?也就是说,
x = pickle.load(open(filename, 'r'))
(On Windows, of course.)
(当然,在Windows上)。