首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >python在链接中快速循环

python在链接中快速循环
EN

Stack Overflow用户
提问于 2022-05-18 20:11:44
回答 2查看 100关注 0票数 0
代码语言:javascript
运行
复制
import requests
import json
from tqdm import tqdm

要循环通过的链接列表

代码语言:javascript
运行
复制
links =['https://www.google.com/','https://www.google.com/','https://www.google.com/']

循环用于使用请求的链接。

代码语言:javascript
运行
复制
data = []
for link in tqdm(range(len(links))):
    response = requests.get(links[link])
    response = response.json()
    data.append(response)

上面的for循环用于遍历所有的链接列表,但是当我尝试在大约1000个链接上循环时,它会花费时间--任何帮助。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-05-18 21:05:00

最简单的方法是将其转换为多线程。最好的方法可能是异步的。

多线程解决方案:

代码语言:javascript
运行
复制
import requests
from tqdm.contrib.concurrent import thread_map

links =['https://www.google.com/','https://www.google.com/','https://www.google.com/']

def get_data(url):
    response = requests.get(url)
    response = response.json()  # Do note this might fail at times
    return response

data = thread_map(get_data, links)

或不使用tqdm.contrib.concurrent.thread_map

代码语言:javascript
运行
复制
import requests
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

links =['https://www.google.com/','https://www.google.com/','https://www.google.com/']

def get_data(url):
    response = requests.get(url)
    response = response.json()  # Do note this might fail at times
    return response

executor = ThreadPoolExecutor()

data = list(tqdm(executor.map(get_data, links), total=len(links)))
票数 1
EN

Stack Overflow用户

发布于 2022-05-18 20:28:30

正如注释中所建议的那样,您可以使用异步aiohttp

代码语言:javascript
运行
复制
import asyncio
import aiohttp

urls = ["your", "links", "here"]

# create aio connector
conn = aiohttp.TCPConnector(limit_per_host=100, limit=0, ttl_dns_cache=300)

# set number of parallel requests - if you are requesting different domains you are likely to be able to set this higher, otherwise you may be rate limited
PARALLEL_REQUESTS = 10

# Create results array to collect results
results = []

async def gather_with_concurrency(n):
    # Create semaphore for async i/o  
    semaphore = asyncio.Semaphore(n)

    # create an aiohttp session using the previous connector
    session = aiohttp.ClientSession(connector=conn)

    # await logic for get request
    async def get(URL):
        async with semaphore:
            async with session.get(url, ssl=False) as response:
                obj = await response.read()
                # once object is acquired we append to list
                results.append(obj)
    # wait for all requests to be gathered and then close session
    await asyncio.gather(*(get(url) for url in urls))
    await session.close()

# get async event loop
loop = asyncio.get_event_loop()
# run using number of parallel requests
loop.run_until_complete(gather_with_concurrency(PARALLEL_REQUESTS))
# Close connection
conn.close()

# loop through results and do something to them
for res in results:
    do_something(res)

我试着尽可能地对代码进行评论。

我使用BS4以这种方式解析请求(在do_something逻辑中),但它实际上将取决于您的用例。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72295410

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档