文章/答案/技术大牛

发布

社区首页 >问答首页 >在特定网站中获取带有线程和队列的文件、图片

问在特定网站中获取带有线程和队列的文件、图片
EN

Stack Overflow用户

提问于 2019-06-08 12:02:21

回答 1查看 158关注 0票数 0

我试图创建一个简单的程序在python3线程和队列，使用4个或更多的线程下载4个图像同时下载4个图像在PC的下载文件夹中，同时避免重复线程之间的信息共享从网址链接下载图像。我想我可以使用像URL1=“Link1”这样的东西？这里有一些链接的例子。

“https://unab-dw2018.s3.amazonaws.com/ldp2019/1.jpeg”

“https://unab-dw2018.s3.amazonaws.com/ldp2019/2.jpeg”

但是我不知道如何在队列中使用线程，我也不知道如何做到这一点。

我已经尝试搜索一些页面，可以解释如何使用线程与队列并发下载，我只找到线程的链接。

这是一个部分工作的代码。我需要的是，程序询问你想要多少线程，然后下载图像，直到它达到图像20，但在代码中，如果输入5，它将只下载5个图像，依此类推。问题是，如果我放5，它会先下载5个图像，然后是下面的5个，直到20。如果它是4个图像，那么4，4，4，4，4。如果是6，那么它将去6,6,6，然后下载剩下的2个。不知何故我必须在代码上实现队列，但我几天前才学习线程，我迷失在如何混合线程和队列在一起。

import threading
import urllib.request
import queue # i need to use this somehow


def worker(cont):
    print("The worker is ON",cont)
    image_download = "URL"+str(cont)+".jpeg"
    download = urllib.request.urlopen(image_download)
    file_save = open("Image "+str(cont)+".jpeg", "wb")
    file_save.write(download.read())
    file_save.close()
    return cont+1


threads = []
q_threads = int(input("Choose input amount of threads between 4 and 20"))
for i in range(0, q_threads):
    h = threading.Thread(target=worker, args=(i+1, int))
    threads.append(h)
for i in range(0, q_threads):
    threads[i].start()

urllib

python-multithreading

python-3.x

multithreading

queue

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-06-08 12:22:36

我从一些用于执行多线程PSO的代码中改编了以下代码

import threading
import queue

if __name__ == "__main__":
    picture_queue = queue.Queue(maxsize=0)
    picture_threads = []
    picture_urls = ["string.com","string2.com"]

    # create and start the threads
    for url in picture_urls:
        picture_threads.append(picture_getter(url, picture_queue))
        picture_threads[i].start()

    # wait for threads to finish
    for picture_thread in picture_threads:
        picture_thread.join()

    # get the results
    picture_list = []
    while not picture_queue.empty():
        picture_list.append(picture_queue.get())

class picture_getter(threading.Thread):
    def __init__(self, url, picture_queue):
        self.url = url
        self.picture_queue = picture_queue
        super(picture_getter, self).__init__()

    def run(self):
        print("Starting download on " + str(self.url))
        self._get_picture()

    def _get_picture(self):
        # --- get your picture --- #
        self.picture_queue.put(picture)

正如你所知道的，stackoverflow上的人喜欢在提供解决方案之前先看看你尝试过什么。无论如何，我都有这段代码。欢迎加入我的新手团队！

我要补充的一件事是，这并没有通过在线程之间共享信息来避免重复。它避免了重复，因为每个线程都被告知要下载什么。如果您的文件名按照您的问题中显示的那样进行了编号，这应该不是问题，因为您可以很容易地构建这些文件名的列表。

更新代码以解决对Treyons原始帖子的编辑

import threading
import urllib.request
import queue
import time

class picture_getter(threading.Thread):
    def __init__(self, url, file_name, picture_queue):
        self.url = url
        self.file_name = file_name
        self.picture_queue = picture_queue

        super(picture_getter, self).__init__()

    def run(self):
        print("Starting download on " + str(self.url))
        self._get_picture()

    def _get_picture(self):
        print("{}: Simulating delay".format(self.file_name))
        time.sleep(1)

        # download and save image
        download = urllib.request.urlopen(self.url)
        file_save = open("Image " + self.file_name, "wb")
        file_save.write(download.read())
        file_save.close()
        self.picture_queue.put("Image " + self.file_name)

def remainder_or_max_threads(num_pictures, num_threads, iterations):
    # remaining pictures
    remainder = num_pictures - (num_threads * iterations)

    # if there are equal or more pictures remaining than max threads
    # return max threads, otherwise remaining number of pictures
    if remainder >= num_threads:
        return max_threads

    else:
        return remainder

if __name__ == "__main__":
    # store the response from the threads
    picture_queue = queue.Queue(maxsize=0)
    picture_threads = []
    num_pictures = 20

    url_prefix = "https://unab-dw2018.s3.amazonaws.com/ldp2019/"
    picture_names = ["{}.jpeg".format(i+1) for i in range(num_pictures)]

    max_threads = int(input("Choose input amount of threads between 4 and 20: "))

    iterations = 0

    # during the majority of runtime iterations * max threads is 
    # the number of pictures that have been downloaded
    # when it exceeds num_pictures all pictures have been downloaded
    while iterations * max_threads < num_pictures:
        # this returns max_threads if there are max_threads or more pictures left to download
        # else it will return the number of remaining pictures
        threads = remainder_or_max_threads(num_pictures, max_threads, iterations)

        # loop through the next section of pictures, create and start their threads
        for name, i in zip(picture_names[iterations * max_threads:], range(threads)):
            picture_threads.append(picture_getter(url_prefix + name, name, picture_queue))
            picture_threads[i + iterations * max_threads].start()

        # wait for threads to finish
        for picture_thread in picture_threads:
            picture_thread.join()

        # increment the iterations
        iterations += 1

    # get the results
    picture_list = []
    while not picture_queue.empty():
        picture_list.append(picture_queue.get())

    print("Successfully downloaded")
    print(picture_list)

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56503312

复制

相似问题

问在特定网站中获取带有线程和队列的文件、图片
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在特定网站中获取带有线程和队列的文件、图片EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在特定网站中获取带有线程和队列的文件、图片
EN