I am using this code to scrape an API:
我使用这段代码来获取API:
submissions = get_submissions(1)
with futures.ProcessPoolExecutor(max_workers=4) as executor:
#or using this: with futures.ThreadPoolExecutor(max_workers=4) as executor:
for s in executor.map(map_func, submissions):
collection_front.update({"time_recorded":time_recorded}, {'$push':{"thread_list":s}}, upsert=True)
It works great/fast with threads but when I try to use processes I get a full queue and this error:
它使用线程非常快,但是当我尝试使用进程时,我得到了一个完整的队列和这个错误:
File "/usr/local/lib/python3.4/dist-packages/praw/objects.py", line 82, in __getattr__
if not self.has_fetched:
RuntimeError: maximum recursion depth exceeded
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
self.run()
File "/usr/lib/python3.4/threading.py", line 868, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.4/concurrent/futures/process.py", line 251, in _queue_management_worker
shutdown_worker()
File "/usr/lib/python3.4/concurrent/futures/process.py", line 209, in shutdown_worker
call_queue.put_nowait(None)
File "/usr/lib/python3.4/multiprocessing/queues.py", line 131, in put_nowait
return self.put(obj, False)
File "/usr/lib/python3.4/multiprocessing/queues.py", line 82, in put
raise Full
queue.Full
Traceback (most recent call last):
File "reddit_proceses.py", line 64, in <module>
for s in executor.map(map_func, submissions):
File "/usr/lib/python3.4/concurrent/futures/_base.py", line 549, in result_iterator
yield future.result()
File "/usr/lib/python3.4/concurrent/futures/_base.py", line 402, in result
return self.__get_result()
File "/usr/lib/python3.4/concurrent/futures/_base.py", line 354, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Note that originally the processes worked great and very fast for small data retrievals, but now they're not working at all. Is this a bug or what's going on that the PRAW object would cause a recursion error with Processes but not with Threads?
注意,最初的过程对小型数据检索非常有用,但现在它们根本不起作用。这是一个bug或者是PRAW对象会导致一个递归错误而不是线程吗?
1 个解决方案
#1
1
I had a similar problem moving from threads to processes only I was using executor.submit. I think this might be the same problem you have, but I can't be sure because I don't know in what context your code is running.
我有一个类似的问题,从线程到进程,只有我在使用executor.submit。我想这可能是你遇到的同样的问题,但是我不能确定,因为我不知道你的代码运行的是什么环境。
In my case what happened was: I was running my code as a script, and I didn't use the always recommended if __name__ == "__main__":
. It looks like when running a new process with the executor, python loads the .py file and runs the function specified in submit. Because it loads the file, the code that exists on the main file (not inside functions or the above if sentence) gets ran, so each process would run again a new process, having an infinite recursion.
在我的例子中,发生的事情是:我在运行我的代码作为一个脚本,并且我没有使用总是建议的如果__name__ == "__main__":。在使用executor运行新进程时,python会加载.py文件并运行提交中指定的函数。因为它加载了文件,所以在主文件上存在的代码(不是在函数内或在上面的if语句中)会被运行,所以每个进程都会再次运行一个新的进程,有一个无限的递归。
It looks like this doesn't happen with threads.
这种情况在线程中是不会发生的。
#1
1
I had a similar problem moving from threads to processes only I was using executor.submit. I think this might be the same problem you have, but I can't be sure because I don't know in what context your code is running.
我有一个类似的问题,从线程到进程,只有我在使用executor.submit。我想这可能是你遇到的同样的问题,但是我不能确定,因为我不知道你的代码运行的是什么环境。
In my case what happened was: I was running my code as a script, and I didn't use the always recommended if __name__ == "__main__":
. It looks like when running a new process with the executor, python loads the .py file and runs the function specified in submit. Because it loads the file, the code that exists on the main file (not inside functions or the above if sentence) gets ran, so each process would run again a new process, having an infinite recursion.
在我的例子中,发生的事情是:我在运行我的代码作为一个脚本,并且我没有使用总是建议的如果__name__ == "__main__":。在使用executor运行新进程时,python会加载.py文件并运行提交中指定的函数。因为它加载了文件,所以在主文件上存在的代码(不是在函数内或在上面的if语句中)会被运行,所以每个进程都会再次运行一个新的进程,有一个无限的递归。
It looks like this doesn't happen with threads.
这种情况在线程中是不会发生的。