将py2移植到py3字符串成为字节

时间:2022-05-30 03:55:05

I have a pickle file that was created with python 2.7 that I'm trying to port to python 3.6. The file is saved in py 2.7 via pickle.dumps(self.saved_objects, -1)

我有一个用python 2.7创建的pickle文件,我试图将它移植到python 3.6中。通过pickle.dump (self)将文件保存在py 2.7中。saved_objects,1)

and loaded in python 3.6 via loads(data, encoding="bytes") (from a file opened in rb mode). If I try opening in r mode and pass encoding=latin1 to loads I get UnicodeDecode errors. When I open it as a byte stream it loads, but literally every string is now a byte string. Every object's __dict__ keys are all b"a_variable_name" which then generates attribute errors when calling an_object.a_variable_name because __getattr__ passes a string and __dict__ only contains bytes. I feel like I've tried every combination of arguments and pickle protocols already. Apart from forcibly converting all objects' __dict__ keys to strings I'm at a loss. Any ideas?

并通过加载(数据、编码=“bytes”)(来自以rb模式打开的文件)在python 3.6中加载。如果我尝试在r模式下打开并通过编码=latin1加载,我将获得UnicodeDecode错误。当我将它作为一个字节流打开时,它会加载,但实际上每个字符串现在都是一个字节字符串。每个对象的__dict__键都是b“a_variable_name”,然后在调用an_object时生成属性错误。a_variable_name因为__getattr__传递一个字符串,__dict__只包含字节。我觉得我已经尝试了各种各样的参数和pickle协议。除了强行将所有对象的__dict__键转换为字符串之外,我还感到困惑。什么好主意吗?

** Skip to 4/28/17 update for better example

** *跳转到4/28/17更新以获得更好的示例

-------------------------------------------------------------------------------------------------------------

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

** Update 4/27/17

* *更新4/27/17

This minimum example illustrates my problem:

这个最小的例子说明了我的问题:

From py 2.7.13

从py 2.7.13

import pickle

class test(object):
    def __init__(self):
        self.x = u"test ¢" # including a unicode str breaks things

t = test()
dumpstr = pickle.dumps(t)

>>> dumpstr
"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."

From py 3.6.1

从py 3.6.1

import pickle

class test(object):
    def __init__(self):
        self.x = "xyz"

dumpstr = b"ccopy_reg\n_reconstructor\np0\n(c__main__\ntest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nVtest \xa2\np7\nsb."

t = pickle.loads(dumpstr, encoding="bytes")

>>> t
<__main__.test object at 0x040E3DF0>
>>> t.x
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    t.x
AttributeError: 'test' object has no attribute 'x'
>>> t.__dict__
{b'x': 'test ¢'} 
>>> 

-------------------------------------------------------------------------------------------------------------

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Update 4/28/17

更新4/28/17

To re-create my issue I'm posting my actual raw pickle data here

为了重新创建我的问题,我在这里发布了我的原始泡菜数据

The pickle file was created in python 2.7.13, windows 10 using

pickle文件是在python 2.7.13中创建的,使用的是windows 10。

with open("raw_data.pkl", "wb") as fileobj:
    pickle.dump(library, fileobj, protocol=0)

(protocol 0 so it's human readable)

(协议0是人类可读的)

To run it you'll need classes.py

运行它需要类。py

# classes.py

class Library(object): pass


class Book(object): pass


class Student(object): pass


class RentalDetails(object): pass

And the test script here:

这里的测试脚本是

# load_pickle.py
import pickle, sys, itertools, os

raw_pkl = "raw_data.pkl"
is_py3 = sys.version_info.major == 3

read_modes = ["rb"]
encodings = ["bytes", "utf-8", "latin-1"]
fix_imports_choices = [True, False]
files = ["raw_data_%s.pkl" % x for x in range(3)]


def py2_test():
    with open(raw_pkl, "rb") as fileobj:
        loaded_object = pickle.load(fileobj)
        print("library dict: %s" % (loaded_object.__dict__.keys()))
        return loaded_object


def py2_dumps():
    library = py2_test()
    for protcol, path in enumerate(files):
        print("dumping library to %s, protocol=%s" % (path, protcol))
        with open(path, "wb") as writeobj:
            pickle.dump(library, writeobj, protocol=protcol)


def py3_test():
    # this test iterates over the different options trying to load
    # the data pickled with py2 into a py3 environment
    print("starting py3 test")
    for (read_mode, encoding, fix_import, path) in itertools.product(read_modes, encodings, fix_imports_choices, files):
        py3_load(path, read_mode=read_mode, fix_imports=fix_import, encoding=encoding)


def py3_load(path, read_mode, fix_imports, encoding):
    from traceback import print_exc
    print("-" * 50)
    print("path=%s, read_mode = %s fix_imports = %s, encoding = %s" % (path, read_mode, fix_imports, encoding))
    if not os.path.exists(path):
        print("start this file with py2 first")
        return
    try:
        with open(path, read_mode) as fileobj:
            loaded_object = pickle.load(fileobj, fix_imports=fix_imports, encoding=encoding)
            # print the object's __dict__
            print("library dict: %s" % (loaded_object.__dict__.keys()))
            # consider the test a failure if any member attributes are saved as bytes
            test_passed = not any((isinstance(k, bytes) for k in loaded_object.__dict__.keys()))
            print("Test %s" % ("Passed!" if test_passed else "Failed"))
    except Exception:
        print_exc()
        print("Test Failed")
    input("Press Enter to continue...")
    print("-" * 50)


if is_py3:
    py3_test()
else:
    # py2_test()
    py2_dumps()

put all 3 in the same directory and run c:\python27\python load_pickle.py first which will create 1 pickle file for each of the 3 protocols. Then run the same command with python 3 and notice that it version converts the __dict__ keys to bytes. I had it working for about 6 hours, but for the life of me I can't figure out how I broke it again.

将所有3个都放在同一个目录下并运行c:\python27\python load_pickle。py首先将为每个协议创建一个pickle文件。然后使用python 3运行相同的命令,注意它将__dict__键转换为字节。我让它工作了大约6个小时,但我的一生都搞不清楚我是怎么把它弄坏的。

3 个解决方案

#1


6  

In short, you're hitting bug 22005 with datetime.date objects in the RentalDetails objects.

简而言之,您正在使用datetime访问bug 22005。RentalDetails对象中的日期对象。

That can be worked around with the encoding='bytes' parameter, but that leaves your classes with __dict__ containing bytes:

这可以通过编码='bytes'参数进行处理,但是这会给您的类留下包含字节的__dict__:

>>> library = pickle.loads(pickle_data, encoding='bytes')
>>> dir(library)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'bytes'

It's possible to manually fix that based on your specific data:

你可以根据你的具体数据手动修复:

def fix_object(obj):
    """Decode obj.__dict__ containing bytes keys"""
    obj.__dict__ = dict((k.decode("ascii"), v) for k, v in obj.__dict__.items())


def fix_library(library):
    """Walk all library objects and decode __dict__ keys"""
    fix_object(library)
    for student in library.students:
            fix_object(student)
    for book in library.books:
            fix_object(book)
            for rental in book.rentals:
                    fix_object(rental)

But that's fragile and enough of a pain you should be looking for a better option.

但这是脆弱的,你应该寻找更好的选择。

1) Implement __getstate__/__setstate__ that maps datetime objects to a non-broken representation, for instance:

1)实现__getstate__/__setstate__,将datetime对象映射为非破碎表示,例如:

class Event(object):
    """Example class working around datetime pickling bug"""

    def __init__(self):
            self.date = datetime.date.today()

    def __getstate__(self):
            state = self.__dict__.copy()
            state["date"] = state["date"].toordinal()
            return state

    def __setstate__(self, state):
            self.__dict__.update(state)
            self.date = datetime.date.fromordinal(self.date)

2) Don't use pickle at all. Along the lines of __getstate__/__setstate__, you can just implement to_dict/from_dict methods or similar in your classes for saving their content as json or some other plain format.

2)不要用泡菜。沿着__getstate__/__setstate__的思路,您可以在类中实现to_dict/from_dict方法或类似方法,将其内容保存为json或其他一些普通格式。

A final note, having a backreference to library in each object shouldn't be required.

最后要注意的是,不需要在每个对象中都有对库的反向引用。

#2


1  

You should treat pickle data as specific to the (major) version of Python that created it.

您应该将pickle数据视为特定于创建它的(主要)Python版本的数据。

(See Gregory Smith's message w.r.t. issue 22005.)

(参见格雷戈里·史密斯的《信息》2005年第2期。)

The best way to get around this is to write a Python 2.7 program to read the pickled data, and write it out in a neutral format.

最好的解决方法是编写一个Python 2.7程序来读取pickle的数据,并以中立的格式写出数据。

Taking a quick look at your actual data, it seems to me that an SQLite database is appropriate as an interchange format, since the Books contain references to a Library and RentalDetails. You could create separate tables for each.

快速查看一下您的实际数据,我觉得SQLite数据库可以作为交换格式,因为这些书中包含了对一个库和RentalDetails的引用。您可以为每个表创建单独的表。

#3


1  

Question: Porting pickle py2 to py3 strings become bytes

问题:将py2移植到py3字符串变成字节

The given encoding='latin-1' below, is ok.
Your Problem with b'' are the result of using encoding='bytes'. This will result in dict-keys being unpickled as bytes instead of as str.

下面给出的编码='latin-1'是可以的。你对b的问题是使用编码='bytes'。这将导致以字节而不是str的形式取消对键的pickle。

The Problem data are the datetime.date values '\x07á\x02\x10', starting at line 56 in raw-data.pkl.

问题数据是datetime。日期值'\ x07a \x02\x10',从raw-data.pkl中的第56行开始。

It's a konwn Issue, as pointed already.
Unpickling python2 datetime under python3
http://bugs.python.org/issue22005

这是一个konwn问题。unpickle python3下的python2 datetime http://bugs.python.org/issue22005

For a workaround, I have patched pickle.py and got unpickled object, e.g.

为了解决问题,我把泡菜补好了。py和unpickle的对象,例如。

book.library.books[0].rentals[0].rental_date=2017-02-16

book.library.books[0].rentals[0].rental_date = 2017-02-16


This will work for me:

这对我来说行得通:

t = pickle.loads(dumpstr, encoding="latin-1")

Output:
<main.test object at 0xf7095fec>
t.__dict__={'x': 'test ¢'}
test ¢

输出: <主要。测试对象为0xf7095fec> t。__dict__ = { ' x ':'测试¢}¢测试

Tested with Python:3.4.2

与Python测试:3.4.2

#1


6  

In short, you're hitting bug 22005 with datetime.date objects in the RentalDetails objects.

简而言之,您正在使用datetime访问bug 22005。RentalDetails对象中的日期对象。

That can be worked around with the encoding='bytes' parameter, but that leaves your classes with __dict__ containing bytes:

这可以通过编码='bytes'参数进行处理,但是这会给您的类留下包含字节的__dict__:

>>> library = pickle.loads(pickle_data, encoding='bytes')
>>> dir(library)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'bytes'

It's possible to manually fix that based on your specific data:

你可以根据你的具体数据手动修复:

def fix_object(obj):
    """Decode obj.__dict__ containing bytes keys"""
    obj.__dict__ = dict((k.decode("ascii"), v) for k, v in obj.__dict__.items())


def fix_library(library):
    """Walk all library objects and decode __dict__ keys"""
    fix_object(library)
    for student in library.students:
            fix_object(student)
    for book in library.books:
            fix_object(book)
            for rental in book.rentals:
                    fix_object(rental)

But that's fragile and enough of a pain you should be looking for a better option.

但这是脆弱的,你应该寻找更好的选择。

1) Implement __getstate__/__setstate__ that maps datetime objects to a non-broken representation, for instance:

1)实现__getstate__/__setstate__,将datetime对象映射为非破碎表示,例如:

class Event(object):
    """Example class working around datetime pickling bug"""

    def __init__(self):
            self.date = datetime.date.today()

    def __getstate__(self):
            state = self.__dict__.copy()
            state["date"] = state["date"].toordinal()
            return state

    def __setstate__(self, state):
            self.__dict__.update(state)
            self.date = datetime.date.fromordinal(self.date)

2) Don't use pickle at all. Along the lines of __getstate__/__setstate__, you can just implement to_dict/from_dict methods or similar in your classes for saving their content as json or some other plain format.

2)不要用泡菜。沿着__getstate__/__setstate__的思路,您可以在类中实现to_dict/from_dict方法或类似方法,将其内容保存为json或其他一些普通格式。

A final note, having a backreference to library in each object shouldn't be required.

最后要注意的是,不需要在每个对象中都有对库的反向引用。

#2


1  

You should treat pickle data as specific to the (major) version of Python that created it.

您应该将pickle数据视为特定于创建它的(主要)Python版本的数据。

(See Gregory Smith's message w.r.t. issue 22005.)

(参见格雷戈里·史密斯的《信息》2005年第2期。)

The best way to get around this is to write a Python 2.7 program to read the pickled data, and write it out in a neutral format.

最好的解决方法是编写一个Python 2.7程序来读取pickle的数据,并以中立的格式写出数据。

Taking a quick look at your actual data, it seems to me that an SQLite database is appropriate as an interchange format, since the Books contain references to a Library and RentalDetails. You could create separate tables for each.

快速查看一下您的实际数据,我觉得SQLite数据库可以作为交换格式,因为这些书中包含了对一个库和RentalDetails的引用。您可以为每个表创建单独的表。

#3


1  

Question: Porting pickle py2 to py3 strings become bytes

问题:将py2移植到py3字符串变成字节

The given encoding='latin-1' below, is ok.
Your Problem with b'' are the result of using encoding='bytes'. This will result in dict-keys being unpickled as bytes instead of as str.

下面给出的编码='latin-1'是可以的。你对b的问题是使用编码='bytes'。这将导致以字节而不是str的形式取消对键的pickle。

The Problem data are the datetime.date values '\x07á\x02\x10', starting at line 56 in raw-data.pkl.

问题数据是datetime。日期值'\ x07a \x02\x10',从raw-data.pkl中的第56行开始。

It's a konwn Issue, as pointed already.
Unpickling python2 datetime under python3
http://bugs.python.org/issue22005

这是一个konwn问题。unpickle python3下的python2 datetime http://bugs.python.org/issue22005

For a workaround, I have patched pickle.py and got unpickled object, e.g.

为了解决问题,我把泡菜补好了。py和unpickle的对象,例如。

book.library.books[0].rentals[0].rental_date=2017-02-16

book.library.books[0].rentals[0].rental_date = 2017-02-16


This will work for me:

这对我来说行得通:

t = pickle.loads(dumpstr, encoding="latin-1")

Output:
<main.test object at 0xf7095fec>
t.__dict__={'x': 'test ¢'}
test ¢

输出: <主要。测试对象为0xf7095fec> t。__dict__ = { ' x ':'测试¢}¢测试

Tested with Python:3.4.2

与Python测试:3.4.2