问如何在一个Scrapy项目中为不同的爬行器使用不同的管道
EN

Stack Overflow用户

提问于 2011-12-04 10:08:26

回答 8查看 28.9K关注 0票数 96

我有一个包含多个蜘蛛的抓取项目。有没有什么方法可以定义哪个管道用于哪个爬虫？并不是我定义的所有管道都适用于每个爬行器。

谢谢

python

scrapy

web-crawler

回答 8

Stack Overflow用户

回答已采纳

发布于 2013-01-05 06:13:05

在the solution from Pablo Hoffman的基础上，您可以在管道对象的process_item方法上使用以下修饰器，以便它检查爬行器的pipeline属性，以确定是否应该执行它。例如：

def check_spider_pipeline(process_item_method):

    @functools.wraps(process_item_method)
    def wrapper(self, item, spider):

        # message template for debugging
        msg = '%%s %s pipeline step' % (self.__class__.__name__,)

        # if class is in the spider's pipeline, then use the
        # process_item method normally.
        if self.__class__ in spider.pipeline:
            spider.log(msg % 'executing', level=log.DEBUG)
            return process_item_method(self, item, spider)

        # otherwise, just return the untouched item (skip this step in
        # the pipeline)
        else:
            spider.log(msg % 'skipping', level=log.DEBUG)
            return item

    return wrapper

要使此装饰器正常工作，爬行器必须具有一个pipeline属性，其中包含您要用于处理项目的Pipeline对象的容器，例如：

class MySpider(BaseSpider):

    pipeline = set([
        pipelines.Save,
        pipelines.Validate,
    ])

    def parse(self, response):
        # insert scrapy goodness here
        return item

然后在pipelines.py文件中：

class Save(object):

    @check_spider_pipeline
    def process_item(self, item, spider):
        # do saving here
        return item

class Validate(object):

    @check_spider_pipeline
    def process_item(self, item, spider):
        # do validating here
        return item

所有管道对象仍然应该在ITEM_PIPELINES中的设置中定义(按照正确的顺序--最好更改一下，以便也可以在爬行器上指定顺序)。

票数 38

Stack Overflow用户

发布于 2016-01-07 11:53:44

只需从主设置中删除所有管道，并使用此内部爬行器。

这将定义到每个爬行器的用户的管道

class testSpider(InitSpider):
    name = 'test'
    custom_settings = {
        'ITEM_PIPELINES': {
            'app.MyPipeline': 400
        }
    }

票数 162

Stack Overflow用户

发布于 2014-12-27 23:52:49

您可以在管道中使用爬行器的name属性

class CustomPipeline(object)

    def process_item(self, item, spider)
         if spider.name == 'spider1':
             # do something
             return item
         return item

以这种方式定义所有管道可以实现您想要的功能。

票数 14

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/8372703

复制

相似问题

问如何在一个Scrapy项目中为不同的爬行器使用不同的管道
EN

回答 8

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在一个Scrapy项目中为不同的爬行器使用不同的管道EN

回答 8

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在一个Scrapy项目中为不同的爬行器使用不同的管道
EN