我正在处理数以千计的图像urls,并希望使用concurrent.futures.ProcessPoolExecutor来加速。
由于某些urls已损坏或图像较大,处理函数可能会挂起或在处理过程中意外消耗大量时间。我想在进程函数上添加一个超时,比如10秒,以消除这些无效图像。
我尝试在futures .as_completed
中设置timeout
参数,可以成功引发TimeoutException
。但是,主进程似乎仍将等待,直到超时子进程完成。有没有什么方法可以立即终止超时子进程并将下一个url放入池中?
from concurrent import futures
def process(url):
### Some time consuming operation
return result
def main():
urls = ['url1','url2','url3',...,'url100']
with futures.ProcessPoolExecutor(max_workers=10) as executor:
future_list = {executor.submit(process, url):url for url in urls}
results = []
try:
for future in futures.as_completed(future_list, timeout=10):
results.append(future.result())
except futures._base.TimeoutException:
print("timeout")
print(results)
if __name__ == '__main__':
main()
在上面的例子中,假设我有100个urls,其中10个是无效的,可能会花费很多时间,如何获取其余90个urls的处理结果列表?
发布于 2020-02-03 22:29:44
而不是使用concurrent.futures
库。
pebble模块就是为了克服这种限制而开发的。
from pebble import ProcessPool
from concurrent.futures import TimeoutError
with process.ProcessPool() as pool:
future = pool.schedule(function, args=(1,2), timeout=5)
try:
result = future.result() # blocks until results are ready
except TimeoutError as error:
print("Function took longer than %d seconds" % error.args[1])
https://stackoverflow.com/questions/60029706
复制相似问题