我看到所有的问题here,但我还不明白。
实际上,在下面的代码中,我做了我需要做的事情,除了重命名de image,所以我尝试在items.py
文件中更改名称,请检查里面的注释。
settings.py
SPIDER_MODULES = ['xxx.spiders']
NEWSPIDER_MODULE = 'xxx.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = '/home/magicnt/xxx/images'
items.py
class XxxItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
image_urls = scrapy.Field()
#images = scrapy.Field()<---with that code work with default name images
images = title<--- I try rename here, but not work
spider.py
from xxx.items import XxxItem
import scrapy
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
class CoverSpider(scrapy.Spider):
name = "pyimagesearch-cover-spider"
start_urls = ['https://xxx.com.br/product']
def parse(self, response):
for bimb in response.css('#mod_imoveis_result'):
imageURL = bimb.xpath('./div[@id="g-img-imo"]/div[@class="img_p_results"]/img/@src').extract_first()
title = bimb.css('#titulo_imovel::text').extract_first()
yield {
'image_urls' : [response.urljoin(imageURL)],
'title' : title
}
next_page = response.xpath('//a[contains(@class, "num_pages") and contains(@class, "pg_number_next")]/@href').extract_first()
yield response.follow(next_page, self.parse)
我的目标是重命名下载的图片与标题从项目。任何关于这个目标的建议都是欢迎的。
我是一个全新的python和oo新手,我经常使用结构化php,但我意识到它可以是一个多么好的scrapy,需要一点耐心和帮助。
https://stackoverflow.com/questions/51543561
复制相似问题