对于我的刮痕项目,我目前正在使用ImagesPipeline。下载的图像是作为文件名的URL的使用SHA1哈希存储。
如何使用自己的自定义文件名来存储文件?
如果我的自定义文件名需要包含来自同一项的另一个刮掉的字段,怎么办?例如,使用item['desc']
和item['image_url']
图像的文件名。如果我正确理解的话,这将涉及从Image管道访问其他项目字段。
任何帮助都将不胜感激。
发布于 2011-06-01 04:11:47
这就是我在Scrapy 0.10中解决问题的方法。检查方法persist_image of FSImagesStoreChangeableDirectory。下载图像的文件名是关键。
class FSImagesStoreChangeableDirectory(FSImagesStore):
def persist_image(self, key, image, buf, info,append_path):
absolute_path = self._get_filesystem_path(append_path+'/'+key)
self._mkdir(os.path.dirname(absolute_path), info)
image.save(absolute_path)
class ProjectPipeline(ImagesPipeline):
def __init__(self):
super(ImagesPipeline, self).__init__()
store_uri = settings.IMAGES_STORE
if not store_uri:
raise NotConfigured
self.store = FSImagesStoreChangeableDirectory(store_uri)
发布于 2014-03-08 01:48:16
这只是对scrapy 0.24 (编辑)的答案的实现,其中image_key()
是不推荐的
class MyImagesPipeline(ImagesPipeline):
#Name download version
def file_path(self, request, response=None, info=None):
#item=request.meta['item'] # Like this you can use all from item, not just url.
image_guid = request.url.split('/')[-1]
return 'full/%s' % (image_guid)
#Name thumbnail version
def thumb_path(self, request, thumb_id, response=None, info=None):
image_guid = thumb_id + response.url.split('/')[-1]
return 'thumbs/%s/%s.jpg' % (thumb_id, image_guid)
def get_media_requests(self, item, info):
#yield Request(item['images']) # Adding meta. I don't know, how to put it in one line :-)
for image in item['images']:
yield Request(image)
发布于 2011-09-08 13:35:56
在0.12里,我解决了这样的问题
class MyImagesPipeline(ImagesPipeline):
#Name download version
def image_key(self, url):
image_guid = url.split('/')[-1]
return 'full/%s.jpg' % (image_guid)
#Name thumbnail version
def thumb_key(self, url, thumb_id):
image_guid = thumb_id + url.split('/')[-1]
return 'thumbs/%s/%s.jpg' % (thumb_id, image_guid)
def get_media_requests(self, item, info):
yield Request(item['images'])
https://stackoverflow.com/questions/6194041
复制相似问题