问了解重命名图像scrapy的工作原理
EN

Stack Overflow用户

提问于 2018-07-27 00:28:17

回答 1查看 1.2K关注 0票数 2

我看到所有的问题here，但我还不明白。

实际上，在下面的代码中，我做了我需要做的事情，除了重命名de image，所以我尝试在items.py文件中更改名称，请检查里面的注释。

settings.py

SPIDER_MODULES = ['xxx.spiders']
NEWSPIDER_MODULE = 'xxx.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = '/home/magicnt/xxx/images'

items.py

class XxxItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title = scrapy.Field()
    image_urls = scrapy.Field()
    #images = scrapy.Field()<---with that code work with default name images
    images = title<--- I try rename here, but not work

spider.py

from xxx.items import XxxItem
import scrapy
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem

class CoverSpider(scrapy.Spider):
    name = "pyimagesearch-cover-spider"
    start_urls = ['https://xxx.com.br/product']
    def parse(self, response):
        for bimb in response.css('#mod_imoveis_result'):
            imageURL = bimb.xpath('./div[@id="g-img-imo"]/div[@class="img_p_results"]/img/@src').extract_first()
            title = bimb.css('#titulo_imovel::text').extract_first()
            yield {
                'image_urls' : [response.urljoin(imageURL)],
                'title' : title
            }
        next_page = response.xpath('//a[contains(@class, "num_pages") and contains(@class, "pg_number_next")]/@href').extract_first()
        yield response.follow(next_page, self.parse)

我的目标是重命名下载的图片与标题从项目。任何关于这个目标的建议都是欢迎的。

我是一个全新的python和oo新手，我经常使用结构化php，但我意识到它可以是一个多么好的scrapy，需要一点耐心和帮助。

python

scrapy

回答 1

Stack Overflow用户

发布于 2018-07-27 09:40:07

我的代码是基于Scrapy Image Pipeline: How to rename images?的，一周前我测试过它，它可以在我自己的爬虫上工作。

# This pipeline is designed for an item with multiple images
class ImagesWithNamesPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
        # values in field "image_name" must have suffix ".jpg"
        # you can only change "image_name" to your own image name filed "images"
        # however it should be a list
        for (image_url, image_name) in zip(item[self.IMAGES_URLS_FIELD], item["image_names"]):
            yield scrapy.Request(url=image_url, meta={"image_name": image_name})

    def file_path(self, request, response=None, info=None):
        image_name = request.meta["image_name"]
        return image_name

下面是ImagePipeline的工作原理：

管道将按顺序执行image_downloaded -> get_images -> file_path。("->“表示调用)

image_downloaded：通过调用persist_file
get_images：将图像转换为JPEG
file_path：保存get_images返回图像返回图像

的相对路径

我浏览了一下the source code of ImagePipeline，没有找到重命名图像的特殊字段。Scrapy会这样重命名它：

def file_path(self, request, response=None, info=None):
    image_guid = hashlib.sha1(to_bytes(url)).hexdigest()  # change to request.url after deprecation
    return 'full/%s.jpg' % (image_guid)

因此，我们应该重写方法file_path。根据ImagePipeline继承的the source code of FilePipeline，我们只需要返回相对路径和persist_file就可以完成任务。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51543561

复制

相似问题

问了解重命名图像scrapy的工作原理
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问了解重命名图像scrapy的工作原理EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问了解重命名图像scrapy的工作原理
EN