open_spider方法如何在项目管道上触发

时间:2022-02-06 23:07:40

I am using scrapy 0.20 with python 2.7

我使用scrapy 0.20和python 2.7

According to scrapy architecture, the spider sends requests to the engine. Then, after the whole crawling process, the item goes through the item pipeline.

根据scrapy架构,蜘蛛向引擎发送请求。然后,在整个爬行过程之后,项目将通过项目管道。

So, the item pipeline has nothing to do when the spider opens or closes. Also, item pipeline components can't know when the spider opens or closes. So, how the open_spider method exists in item pipeline components according to this page?

因此,当蜘蛛打开或关闭时,项目管道无关。此外,项目管道组件无法知道蜘蛛何时打开或关闭。那么,根据这个页面,open_spider方法如何存在于项目管道组件中?

2 个解决方案

#1


0  

It's saying that you can implement open_spider() and close_spider() as callbacks if you want to. If you don't, your item pipeline will operate without awareness of the spider opens and closes. If you implement these routines, they will be called when the spider opens and closes. So if you want your item pipeline to take special action at when the spider opens and closes, you can make that happen by writing these routines in your pipeline.

它说如果你愿意,你可以实现open_spider()和close_spider()作为回调。如果不这样做,您的项目管道将在不知道蜘蛛打开和关闭的情况下运行。如果您实现这些例程,它们将在蜘蛛打开和关闭时被调用。因此,如果您希望项目管道在蜘蛛打开和关闭时采取特殊操作,您可以通过在管道中编写这些例程来实现。

#2


0  

Having examined the architecture diagram and read the data flow description, I noted the following:

在检查了架构图并阅读数据流描述后,我注意到以下内容:

  1. The data flow description does not explicitly state that spiders do not communicate with the item pipeline
  2. 数据流描述没有明确说明蜘蛛不与项目管道通信
  3. Spiders and the item pipeline are linked via the scrapy engine, so presumably the engine is the process that informs the item pipeline that a spider has opened or closed, if open_spider() or close_spider() have been implemented
  4. 蜘蛛和项目管道通过scrapy引擎链接,因此可能是引擎是通过项目管道通知蜘蛛已打开或关闭的过程,如果已实现open_spider()或close_spider()

#1


0  

It's saying that you can implement open_spider() and close_spider() as callbacks if you want to. If you don't, your item pipeline will operate without awareness of the spider opens and closes. If you implement these routines, they will be called when the spider opens and closes. So if you want your item pipeline to take special action at when the spider opens and closes, you can make that happen by writing these routines in your pipeline.

它说如果你愿意,你可以实现open_spider()和close_spider()作为回调。如果不这样做,您的项目管道将在不知道蜘蛛打开和关闭的情况下运行。如果您实现这些例程,它们将在蜘蛛打开和关闭时被调用。因此,如果您希望项目管道在蜘蛛打开和关闭时采取特殊操作,您可以通过在管道中编写这些例程来实现。

#2


0  

Having examined the architecture diagram and read the data flow description, I noted the following:

在检查了架构图并阅读数据流描述后,我注意到以下内容:

  1. The data flow description does not explicitly state that spiders do not communicate with the item pipeline
  2. 数据流描述没有明确说明蜘蛛不与项目管道通信
  3. Spiders and the item pipeline are linked via the scrapy engine, so presumably the engine is the process that informs the item pipeline that a spider has opened or closed, if open_spider() or close_spider() have been implemented
  4. 蜘蛛和项目管道通过scrapy引擎链接,因此可能是引擎是通过项目管道通知蜘蛛已打开或关闭的过程,如果已实现open_spider()或close_spider()