文章/答案/技术大牛

发布

社区首页 >问答首页 >使用多个并行线程以部分方式下载大型文件

问使用多个并行线程以部分方式下载大型文件
EN

Stack Overflow用户

提问于 2019-10-26 13:46:03

回答 4查看 6.5K关注 0票数 2

我有一个用例，在这个用例中，需要使用多个线程以部分方式下载一个大型远程文件。每个线程必须同时运行(并行)，抓取文件的一个特定部分。我们的期望是，一旦成功下载了所有部件，就将这些部件合并到一个(原始)文件中。

也许使用请求库可以完成这项工作，但是我不知道如何将它多线程到一个将块组合在一起的解决方案中。

url = 'https://url.com/file.iso'
headers = {"Range": "bytes=0-1000000"}  # first megabyte
r = get(url, headers=headers)

我还在考虑使用curl，Python将在其中编排下载，但我不确定这是正确的方式。它看起来太复杂了，远离了普通的Python解决方案。就像这样：

curl --range 200000000-399999999 -o file.iso.part2

有人能解释一下你会怎么做这种事吗？或者在Python 3中发布一个代码示例？我通常很容易找到与Python相关的答案，但是这个问题的解决方案似乎在逃避我。

multithreading

curl

python-requests

python

python-3.x

回答 4

Stack Overflow用户

回答已采纳

发布于 2019-10-26 14:49:38

这里有一个使用Python 3和Asyncio的版本，它只是一个例子，它可以改进，但是您应该能够获得所需的一切。

get_size：发送一个头请求以获取文件的大小
download_range：下载一个块
download：下载所有块并合并它们

import asyncio
import concurrent.futures
import functools
import requests
import os


# WARNING:
# Here I'm pointing to a publicly available sample video.
# If you are planning on running this code, make sure the
# video is still available as it might change location or get deleted.
# If necessary, replace it with a URL you know is working.
URL = 'https://download.samplelib.com/mp4/sample-30s.mp4'
OUTPUT = 'video.mp4'


async def get_size(url):
    response = requests.head(url)
    size = int(response.headers['Content-Length'])
    return size


def download_range(url, start, end, output):
    headers = {'Range': f'bytes={start}-{end}'}
    response = requests.get(url, headers=headers)

    with open(output, 'wb') as f:
        for part in response.iter_content(1024):
            f.write(part)


async def download(run, loop, url, output, chunk_size=1000000):
    file_size = await get_size(url)
    chunks = range(0, file_size, chunk_size)

    tasks = [
        run(
            download_range,
            url,
            start,
            start + chunk_size - 1,
            f'{output}.part{i}',
        )
        for i, start in enumerate(chunks)
    ]

    await asyncio.wait(tasks)

    with open(output, 'wb') as o:
        for i in range(len(chunks)):
            chunk_path = f'{output}.part{i}'

            with open(chunk_path, 'rb') as s:
                o.write(s.read())

            os.remove(chunk_path)


if __name__ == '__main__':
    executor = concurrent.futures.ThreadPoolExecutor(max_workers=3)
    loop = asyncio.new_event_loop()
    run = functools.partial(loop.run_in_executor, executor)

    asyncio.set_event_loop(loop)

    try:
        loop.run_until_complete(
            download(run, loop, URL, OUTPUT)
        )
    finally:
        loop.close()

票数 8

Stack Overflow用户

发布于 2019-10-26 14:48:10

您可以使用G请求书并行下载。

import grequests

URL = 'https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-10.1.0-amd64-netinst.iso'
CHUNK_SIZE = 104857600  # 100 MB
HEADERS = []

_start, _stop = 0, 0
for x in range(4):  # file size is > 300MB, so we download in 4 parts. 
    _start = _stop
    _stop = 104857600 * (x + 1)
    HEADERS.append({"Range": "bytes=%s-%s" % (_start, _stop)})


rs = (grequests.get(URL, headers=h) for h in HEADERS)
downloads = grequests.map(rs)

with open('/tmp/debian-10.1.0-amd64-netinst.iso', 'ab') as f:
    for download in downloads:
        print(download.status_code)
        f.write(download.content)

PS:我没有检查这些范围是否正确确定，下载的md5sum是否匹配！从总体上看，这只会显示它是如何工作的。

票数 1

Stack Overflow用户

发布于 2022-06-15 14:27:31

我发现最好的方法是使用一个名为pySmartDL的模块。

步骤1：pip install pySmartDL

步骤2:下载您可以使用的文件

from pySmartDL import SmartDL
obj = SmartDL(url, destination)
obj.start()

注意:默认情况下，这会给你一个下载表。

如果您需要将下载进度与gui挂钩，您可以使用

obj = SmartDL(url, dest,progress_bar=False)
obj.start(blocking=False)
while not obj.isFinished():
    download_precentage = round(obj.get_progress()*100,2)
    time.sleep(0.2)
    print(download_precentage)

如果您想使用更多的线程，您可以使用

obj = SmartDL(url, destination,threads=7) #by default thread = 5
obj.start()

您可以从项目页面中找到更多的功能。

下载：http://pypi.python.org/pypi/pySmartDL/ 文档：http://itaybb.github.io/pySmartDL/ 项目页面：https://github.com/iTaybb/pySmartDL/ Bug与问题：https://github.com/iTaybb/pySmartDL/issues

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58571343

复制

相似问题

问使用多个并行线程以部分方式下载大型文件
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用多个并行线程以部分方式下载大型文件EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用多个并行线程以部分方式下载大型文件
EN