我有一个简单的脚本,可以从图像URL列表中获取图像大小,但当列表太大时,它太慢了(例如: 120个URL,它可能需要10秒才能运行)
def get_image_size(url):
data = requests.get(url).content
try:
im = Image.open(BytesIO(data))
size = im.size
except:
size = False
return size
list_images = ['https://example.com/img.png', ...]
for img in list_images:
get_image_size(img)
我已经尝试过Gevent,它可以让我节省50%的处理时间,但这还不够。我想知道是否有其他方法可以让这个脚本运行得更快?
最终目标是获得数据集的5个最大图像。
发布于 2018-07-29 17:33:25
您可以使用grequests (请求和gevent),而不是使用Pillow来获取图像大小,您可以从HTTP头标识图像大小:
通常,性能取决于网络连接/服务器速度和镜像大小:
import grequests
def downloadImages(images):
result = {}
rs = (grequests.get(t) for t in images)
downloads = grequests.map(rs, size=len(images))
for download in downloads:
_status = 200 == download.status_code
_url = download.url
if _status:
for k, v in download.headers.items():
if k.lower() == 'content-length':
result[_url] = v
continue
else:
result[_url] = -1
return result
if __name__ == '__main__':
urls = [
'https://b.tile.openstreetmap.org/12/2075/1409.png',
'https://b.tile.openstreetmap.org/12/2075/1410.png',
'https://b.tile.openstreetmap.org/12/2075/1411.png',
'https://b.tile.openstreetmap.org/12/2075/1412.png'
]
sizes = downloadImages(urls)
pprint.pprint(sizes)
返回:
{'https://b.tile.openstreetmap.org/12/2075/1409.png': '40472',
'https://b.tile.openstreetmap.org/12/2075/1410.png': '38267',
'https://b.tile.openstreetmap.org/12/2075/1411.png': '36338',
'https://b.tile.openstreetmap.org/12/2075/1412.png': '30467'}
https://stackoverflow.com/questions/51571729
复制相似问题