Python Scrapy,如何为项目定义管道?

时间:2022-10-09 14:29:23

I am using scrapy to crawl different sites, for each site I have an Item (different information is extracted)

我正在使用scrapy来抓取不同的网站,为每个网站我都有一个项目(提取不同的信息)

Well, for example I have a generic pipeline (most of information is the same) but now I am crawling some google search response and the pipeline must be different.

好吧,例如我有一个通用管道(大多数信息是相同的)但现在我正在抓取一些谷歌搜索响应和管道必须是不同的。

For example:

GenericItem uses GenericPipeline

GenericItem使用GenericPipeline

But the GoogleItem uses GoogleItemPipeline, but when the spider is crawling it tries to use GenericPipeline instead of GoogleItemPipeline....how can I specify which pipeline Google spider must use?

但GoogleItem使用GoogleItemPipeline,但当蜘蛛爬行时,它会尝试使用GenericPipeline而不是GoogleItemPipeline ....如何指定Google蜘蛛必须使用哪个管道?

1 个解决方案

#1


Now only one way - check Item type in pipeline and process it or return "as is"

现在只有一种方法 - 在管道中检查项目类型并处理它或返回“按原样”

pipelines.py:

from grabbers.items import FeedItem

class StoreFeedPost(object):

    def process_item(self, domain, item):
        if isinstance(item, FeedItem):
            #process it...

        return item

items.py:

from scrapy.item import ScrapedItem

class FeedItem(ScrapedItem):
    pass

#1


Now only one way - check Item type in pipeline and process it or return "as is"

现在只有一种方法 - 在管道中检查项目类型并处理它或返回“按原样”

pipelines.py:

from grabbers.items import FeedItem

class StoreFeedPost(object):

    def process_item(self, domain, item):
        if isinstance(item, FeedItem):
            #process it...

        return item

items.py:

from scrapy.item import ScrapedItem

class FeedItem(ScrapedItem):
    pass