尝试使用异步和aiohttp从网站中提取数据,等待在for循环函数中出现问题。
这里我的剧本:
async def get_page(session,x):
async with session.get(f'https://disclosure.bursamalaysia.com/FileAccess/viewHtml?e={x}') as r:
return await r.text()
async def get_all(session, urls):
tasks =[]
sem = asyncio.Semaphore(1)
count = 0
for x in urls:
count +=1
task = asyncio.create_task(get_page(session,x))
tasks.append(task)
print(count,'-ID-',x,'|', end=' ')
results = await asyncio.gather(*task)
return results
async def main(urls):
async with aiohttp.ClientSession() as session:
data = await get_all(session, urls)
return
if __name__ == '__main__':
urls = titlelink
results = asyncio.run(main(urls))
print(results)
对于此错误,当刮板断裂时,它返回的内容如下:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-5ac99108678c> in <module>
22 if __name__ == '__main__':
23 urls = titlelink
---> 24 results = asyncio.run(main(urls))
25 print(results)
~\AppData\Local\Programs\Python\Python38\lib\site-packages\nest_asyncio.py in run(future, debug)
30 loop = asyncio.get_event_loop()
31 loop.set_debug(debug)
---> 32 return loop.run_until_complete(future)
33
34 if sys.version_info >= (3, 6, 0):
~\AppData\Local\Programs\Python\Python38\lib\site-packages\nest_asyncio.py in run_until_complete(self, future)
68 raise RuntimeError(
69 'Event loop stopped before Future completed.')
---> 70 return f.result()
71
72 def _run_once(self):
~\AppData\Local\Programs\Python\Python38\lib\asyncio\futures.py in result(self)
176 self.__log_traceback = False
177 if self._exception is not None:
--> 178 raise self._exception
179 return self._result
180
~\AppData\Local\Programs\Python\Python38\lib\asyncio\tasks.py in __step(***failed resolving arguments***)
278 # We use the `send` method directly, because coroutines
279 # don't have `__iter__` and `__next__` methods.
--> 280 result = coro.send(None)
281 else:
282 result = coro.throw(exc)
<ipython-input-3-5ac99108678c> in main(urls)
17 async def main(urls):
18 async with aiohttp.ClientSession() as session:
---> 19 data = await get_all(session, urls)
20 return
21
<ipython-input-3-5ac99108678c> in get_all(session, urls)
12 tasks.append(task)
13 print(count,'-ID-',x,'|', end=' ')
---> 14 results = await asyncio.gather(*task)
15 return results
16
~\AppData\Local\Programs\Python\Python38\lib\asyncio\futures.py in __await__(self)
260 yield self # This tells Task to wait for completion.
261 if not self.done():
--> 262 raise RuntimeError("await wasn't used with future")
263 return self.result() # May raise too.
264
RuntimeError: await wasn't used with future
这个错误是因为在for循环函数中放置等待,还是因为服务器问题?或者我写剧本的方式是错的。感谢你们中有谁能指点我或引导我走向正确的方向
发布于 2021-07-28 16:28:07
您可以使用multiprocessing
同时刮多个链接(并行):
from multiprocessing import Pool
def scrape(url):
#Scraper script
p = Pool(10)
# This “10” means that 10 URLs will be processed at the same time.
p.map(scrape, list_of_all_urls)
p.terminate()
p.join()
这里,我们用list_of_all_urls映射函数,Pool p将负责执行每个函数,concurrently.This类似于simple.py中的list_of_all_urls循环,但这里是并发执行的。如果URL的数量为100,并且我们指定了Pool( 20 ),那么它将进行5次迭代(100/20),20个URL将在一次执行中被处理。
需要注意的两件事
您可以访问这里获取更多/详细信息。
我相信这和前面的问题是一样的,我认为你可以使用multiprocessing
。我知道这不是正确的答案,但您可以使用多进程,这是容易的,直接的。
发布于 2022-12-04 16:04:32
await asyncio.gather(*task)
应:
await asyncio.gather(*tasks)
异常实际上来自*task
。不确定这个语法的意义是什么,但这肯定不是您想要的:
>>> t = asyncio.Task(asyncio.sleep(10))
>>> (*t,)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: await wasn't used with future
https://stackoverflow.com/questions/68563801
复制相似问题