用pyodbc进行Python多处理和数据库访问“不安全”?

时间:2021-04-07 20:44:22

The Problem:

问题:

I am getting the following traceback and don't understand what it means or how to fix it:

我得到以下回溯,并不明白它的含义或如何解决它:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python26\lib\multiprocessing\forking.py", line 342, in main
    self = load(from_parent)
  File "C:\Python26\lib\pickle.py", line 1370, in load
    return Unpickler(file).load()
  File "C:\Python26\lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "C:\Python26\lib\pickle.py", line 1083, in load_newobj
    obj = cls.__new__(cls, *args)
TypeError: object.__new__(pyodbc.Cursor) is not safe, use pyodbc.Cursor.__new__()

The situation:

情况:

I've got a SQL Server database full of data to be processed. I'm trying to use the multiprocessing module to parallelize the work and take advantage of the multiple cores on my computer. My general class structure is as follows:

我有一个SQL Server数据库,里面装满了要处理的数据。我正在尝试使用多处理模块来并行化工作并利用计算机上的多个核心。我的一般类结构如下:

  • MyManagerClass
    • This is the main class, where the program starts.
    • 这是程序启动的主类。
    • It creates two multiprocessing.Queue objects, one work_queue and one write_queue
    • 它创建了两个multiprocessing.Queue对象,一个work_queue和一个write_queue
    • It also creates and launches the other processes, then waits for them to finish.
    • 它还创建并启动其他进程,然后等待它们完成。
    • NOTE: this is not an extension of multiprocessing.managers.BaseManager()
    • 注意:这不是multiprocessing.managers.BaseManager()的扩展。
  • MyManagerClass这是程序启动的主类。它创建了两个multiprocessing.Queue对象,一个work_queue和一个write_queue它还创建并启动其他进程,然后等待它们完成。注意:这不是multiprocessing.managers.BaseManager()的扩展。
  • MyReaderClass
    • This class reads the data from the SQL Server database.
    • 此类从SQL Server数据库中读取数据。
    • It puts items in the work_queue.
    • 它将项目放在work_queue中。
  • MyReaderClass此类从SQL Server数据库中读取数据。它将项目放在work_queue中。
  • MyWorkerClass
    • This is where the work processing happens.
    • 这是工作处理发生的地方。
    • It gets items from the work_queue and puts completed items in the write_queue.
    • 它从work_queue获取项目并将完成的项目放入write_queue中。
  • MyWorkerClass这是工作处理发生的地方。它从work_queue获取项目并将完成的项目放入write_queue中。
  • MyWriterClass
    • This class is in charge of writing the processed data back to the SQL Server database.
    • 该类负责将处理后的数据写回SQL Server数据库。
    • It gets items from the write_queue.
    • 它从write_queue获取项目。
  • MyWriterClass此类负责将处理后的数据写回SQL Server数据库。它从write_queue获取项目。

The idea is that there will be one manager, one reader, one writer, and many workers.

这个想法是将有一个经理,一个读者,一个作家和许多工人。

Other details:

其他详情:

I get the traceback twice in stderr, so I'm thinking that it happens once for the reader and once for the writer. My worker processes get created fine, but just sit there until I send a KeyboardInterrupt because they have nothing in the work_queue.

我在stderr中得到了两次回溯,所以我认为它对读者来说只发生一次,对作者来说只发生一次。我的工作进程被创建得很好,但只是坐在那里直到我发送一个KeyboardInterrupt,因为它们在work_queue中没有任何内容。

Both the reader and writer have their own connection to the database, created on initialization.

读者和编写者都有自己的数据库连接,在初始化时创建。

Solution:

解:

Thanks to Mark and Ferdinand Beyer for their answers and questions that led to this solution. They rightfully pointed out that the Cursor object is not "pickle-able", which is the method that multiprocessing uses to pass information between processes.

感谢Mark和Ferdinand Beyer提供的解决方案和问题。他们理所当然地指出Cursor对象不是“pickle-able”,这是多处理用于在进程之间传递信息的方法。

The issue with my code was that MyReaderClass(multiprocessing.Process) and MyWriterClass(multiprocessing.Process) both connected to the database in their __init__() methods. I created both these objects (i.e. called their init method) in MyManagerClass, then called start().

我的代码的问题是MyReaderClass(multiprocessing.Process)和MyWriterClass(multiprocessing.Process)都在它们的__init __()方法中连接到数据库。我在MyManagerClass中创建了这两个对象(即称为init方法),然后调用start()。

So it would create the connection and cursor objects, then try to send them to the child process via pickle. My solution was to move the instantiation of the connection and cursor objects to the run() method, which isn't called until the child process is fully created.

因此它会创建连接和游标对象,然后尝试通过pickle将它们发送到子进程。我的解决方案是将连接和游标对象的实例化移动到run()方法,该方法在完全创建子进程之前不会被调用。

3 个解决方案

#1


8  

Multiprocessing relies on pickling to communicate objects between processes. The pyodbc connection and cursor objects can not be pickled.

多处理依赖于pickle来在进程之间传递对象。 pyodbc连接和游标对象无法进行pickle。

>>> cPickle.dumps(aCursor)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.5/copy_reg.py", line 69, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle Cursor objects
>>> cPickle.dumps(dbHandle)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.5/copy_reg.py", line 69, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle Connection objects

"It puts items in the work_queue", what items? Is it possible the cursor object is getting passed as well?

“它将项目放入work_queue”,有哪些项目?光标对象是否也可以传递?

#2


3  

The error is raised within the pickle module, so somewhere your DB-Cursor object gets pickled and unpickled (serialized to storage and unserialized to the Python object again).

这个错误是在pickle模块中引发的,所以你的DB-Cursor对象会被某个地方pickle和unpickled(序列化为存储并再次反序列化为Python对象)。

I guess that pyodbc.Cursor does not support pickling. Why should you try to persist the cursor object anyway?

我猜pyodbc.Cursor不支持酸洗。为什么要尝试持久保存游标对象?

Check if you use pickle somewhere in your work chain or if it is used implicitely.

检查您是否在工作链中的某个地方使用泡菜,或者是否使用了腌渍。

#3


1  

pyodbc has Python DB-API threadsafety level 1. This means threads cannot share connections, and it's not threadsafe at all.

pyodbc具有Python DB-API threadsafety level 1.这意味着线程不能共享连接,并且它根本不是线程安全的。

I don't think underlying thread-safe ODBC drivers make a difference. It's in the Python code as noted by the Pickling error.

我不认为底层的线程安全的ODBC驱动程序有所作为。它是在Pickling错误中指出的Python代码中。

#1


8  

Multiprocessing relies on pickling to communicate objects between processes. The pyodbc connection and cursor objects can not be pickled.

多处理依赖于pickle来在进程之间传递对象。 pyodbc连接和游标对象无法进行pickle。

>>> cPickle.dumps(aCursor)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.5/copy_reg.py", line 69, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle Cursor objects
>>> cPickle.dumps(dbHandle)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.5/copy_reg.py", line 69, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle Connection objects

"It puts items in the work_queue", what items? Is it possible the cursor object is getting passed as well?

“它将项目放入work_queue”,有哪些项目?光标对象是否也可以传递?

#2


3  

The error is raised within the pickle module, so somewhere your DB-Cursor object gets pickled and unpickled (serialized to storage and unserialized to the Python object again).

这个错误是在pickle模块中引发的,所以你的DB-Cursor对象会被某个地方pickle和unpickled(序列化为存储并再次反序列化为Python对象)。

I guess that pyodbc.Cursor does not support pickling. Why should you try to persist the cursor object anyway?

我猜pyodbc.Cursor不支持酸洗。为什么要尝试持久保存游标对象?

Check if you use pickle somewhere in your work chain or if it is used implicitely.

检查您是否在工作链中的某个地方使用泡菜,或者是否使用了腌渍。

#3


1  

pyodbc has Python DB-API threadsafety level 1. This means threads cannot share connections, and it's not threadsafe at all.

pyodbc具有Python DB-API threadsafety level 1.这意味着线程不能共享连接,并且它根本不是线程安全的。

I don't think underlying thread-safe ODBC drivers make a difference. It's in the Python code as noted by the Pickling error.

我不认为底层的线程安全的ODBC驱动程序有所作为。它是在Pickling错误中指出的Python代码中。