在Python中划分大型文件以进行多处理的最佳方法是什么？

在Python中划分大型文件以进行多处理的最佳方法是使用multiprocessing库和concurrent.futures库。这两个库可以帮助您轻松地将大型文件划分为多个部分，并在多个处理器上并行处理这些部分。

以下是一个简单的示例，展示了如何使用multiprocessing和concurrent.futures库将大型文件划分为多个部分并进行并行处理：

import os
import concurrent.futures
from multiprocessing import cpu_count

def process_chunk(chunk):
    # 在这里处理您的文件块
    pass

def split_file(file_path, chunk_size=1024*1024):
    with open(file_path, 'rb') as file:
        while True:
            chunk = file.read(chunk_size)
            if not chunk:
                break
            yield chunk

def process_file(file_path, chunk_size=1024*1024):
    with concurrent.futures.ThreadPoolExecutor(max_workers=cpu_count()) as executor:
        futures = []
        for chunk in split_file(file_path, chunk_size):
            future = executor.submit(process_chunk, chunk)
            futures.append(future)

        for future in concurrent.futures.as_completed(futures):
            future.result()

if __name__ == '__main__':
    file_path = 'path/to/your/large/file'
    process_file(file_path)

在这个示例中，我们首先定义了一个process_chunk函数，该函数将处理文件块。然后，我们定义了一个split_file函数，该函数将文件划分为多个部分。最后，我们定义了一个process_file函数，该函数使用ThreadPoolExecutor并行处理文件的各个部分。

这种方法可以有效地利用多个处理器来处理大型文件，从而提高处理速度。在实际应用中，您可能需要根据您的需求对这个示例进行调整。

UDP并行ThreadPoolExecutor

udp、python-multithreading

我需要接收来自不同UDP端口的消息。我的代码工作，但我不知道为什么消息等待对方！我的意思是，如果我发送的消息不是按dict顺序，消息就不会通过函数传递。函数等待每个周期更改端口。 import concurrent.futures, socket ports = {} ports['BRAIN'] = 10015 ports['REAPER'] = 10025 ports['CSOUND'] = 10000 def receive_from(port): host = 'localhost' buffer_s

浏览 7提问于2022-11-14得票数 0

1回答

如何在多线程中使用tqdm？

python、multithreading、tqdm

我试图使用tqdm报告每个文件从三个链接下载的进度，我想使用多线程同时从每个链接下载，同时更新进度条。但是，当我执行我的脚本时，有多行进度条，线程似乎在同时更新tqdm进度条。我在问如何运行多线程来下载文件，同时维护每个下载的进度条，而没有重复的栏填充整个屏幕？这是我的密码。 import os import sys import requests from pathlib import Path from tqdm import tqdm from concurrent.futures import ThreadPoolExecutor as PE def get_filename(ur

浏览 4提问于2020-09-10得票数 7

1回答

有没有办法并行运行预加载的Keras模型的“预测”？

python、tensorflow、keras、parallel-processing、joblib

我有一个类，在这个类中我实例化了一个Keras模型来执行预测。这个类的组织方式如下： class MyClass(): def __init__(self): self.model = None def load(path): self.model = tf.keras.models.load_model(path_) def inference(data): #... pred = self.model.predict(data) #...

浏览 5提问于2021-09-30得票数 2

1回答

concurrent.futures不并行写入

python、multithreading、python-3.x、concurrent.futures

我有一个列表dataframe_chunk，其中包含一个非常大的熊猫的数据块，我想把每一个块写到不同的csv中，并并行地这样做。但是，我看到文件是按顺序写入的，我不知道为什么会这样。下面是代码： import concurrent.futures as cfu def write_chunk_to_file(chunk, fpath): chunk.to_csv(fpath, sep=',', header=False, index=False) pool = cfu.ThreadPoolExecutor(N_CORES) futures = [] for i

浏览 2提问于2016-07-19得票数 0

1回答

SSH和Ping与Python异步并发主机？

ssh、python-asyncio、ping、paramiko

我试图同时向主机发送SSH/Ping，但没有看到任何结果，因此，可能我的实现是不正确的。这就是我到目前为止所拥有的。任何想法都很感激。 import paramiko import time import asyncio import subprocess async def sshTest(ipaddress,deviceUsername,devicePassword,sshPort): #finalDict try: print("Performing SSH Connection to the device")

浏览 8提问于2022-02-18得票数 0

回答已采纳

2回答

Python:如何使用带有ProcessPoolExecutor的外部队列？

python、asynchronous、multiprocessing

我最近开始使用Python的多线程和多处理特性。我尝试编写使用生产者/消费者方法从JSON日志文件中读取块的代码，将这些块作为事件写入队列，然后启动一组进程，这些进程将轮询来自该队列的事件(文件块)，并对每个进程进行处理，打印出结果。我的目的是先启动进程，让它们等待事件开始进入队列。我目前正在使用这段代码，该代码似乎有效，使用了我从示例中找到的一些片段： import re, sys from multiprocessing import Process, Queue def process(file, chunk): f = open(file, "rb")

浏览 1提问于2014-09-10得票数 6

回答已采纳

1回答

如何用异步并行化计算？

python、python-3.x、parallel-processing、python-asyncio

我有一个代码块，它需要很长时间来执行，而且CPU非常密集。我想要运行这个块几次，并希望使用我的CPU的全部功能。查看asyncio，我知道它主要用于异步通信，但也是异步任务的通用工具。在下面的示例中，time.sleep(y)是我要运行的代码的占位符。在本例中，每个协同例程都一个接一个地执行，执行时间约为8秒。 import asyncio import logging import time async def _do_compute_intense_stuff(x, y, logger): logger.info('Getting it started...'

浏览 1提问于2018-07-10得票数 3

回答已采纳

1回答

Python3.4 concurrent.futures.Executor不允许控件暂停并继续线程

python、multithreading、python-multithreading、python-3.4、concurrent.futures

我正在使用concurrent.future.ThredPoolExecutor来执行多线程，我正在执行几个http服务，我希望在服务器关闭时，对线程的控制暂停执行，启动服务器，然后继续执行。服务器停机的触发器是，我正在检查某个文件在特定位置是否可用，然后我将不得不暂停执行。因此，concurrent.futures.Executor.shutdown()将向执行器发出信号，表示在执行当前未决的期货时，它应该释放它正在使用的任何资源。但是，当我使用executor的when ()方法时，它不是立即关闭线程，而是在完成整个执行之后调用shutting ()。实际上，由于在concurre

浏览 4提问于2014-07-25得票数 2

1回答

使用ProcessPoolExecutor时增加python脚本的执行时间

python-3.x、subprocess、python-multiprocessing、concurrent.futures、process-pool

当我在一台56核心机器上使用进程池执行器触发python脚本的并行实例时，我观察到它的执行时间增加了。脚本abc.py导入一个沉重的python库，它需要花费大约1秒的时间。 time python ~/abc.py real 0m0.846s user 0m0.620s sys 0m0.078s 测试方法 import shlex from subprocess import Popen, PIPE def test(): command = "python /u/deeparora/abc.py" p = Popen(shlex.split(comman

浏览 8提问于2021-12-30得票数 1

1回答

python、python-multithreading、concurrent.futures

我对"concurrent.futures“的并行处理相当陌生，我正在测试一些简单的实验。我编写的代码似乎有效，但我不知道如何存储结果。我试图创建一个列表(“期货”)并附加结果，但这大大降低了程序的速度。我想知道是否有更好的方法来做到这一点。谢谢。 import concurrent.futures import time couple_ods= [] futures=[] dtab={} for i in range(100): for j in range(100): dtab[i,j]=i+j/2 couple_ods.append((i,

浏览 1提问于2018-08-29得票数 14

1回答

“`ProcessPoolExecutor`”在Ubuntu上工作，但在Windows 10上运行Python3.5.3的JUPYTE5.0.0笔记本时，使用“`BrokenProcessPool`”失败

windows、multithreading、python-3.x、multiprocessing、jupyter-notebook

我在Windows 10上运行Python3.5.3笔记本，运行的是木星5.0.0笔记本。下面的示例代码无法运行： from concurrent.futures import as_completed, ProcessPoolExecutor import time import numpy as np def do_work(idx1, idx2): time.sleep(0.2) return np.mean([idx1, idx2]) with ProcessPoolExecutor(max_workers=4) as executor: futures =

浏览 3提问于2017-05-07得票数 8

2回答

python中的多进程，多进程运行相同的指令

python、multiprocessing、python-multiprocessing

我在Python中使用多进程进行并行化。我正在尝试使用pandas对从excel文件中读取的数据块进行并行处理。我是多处理和并行处理的新手。在简单代码的实现期间， import time; import os; from multiprocessing import Process import pandas as pd print os.getpid(); df = pd.read_csv('train.csv', sep=',',usecols=["POLYLINE"],iterator=True,chunksize=2); print &

浏览 0提问于2016-04-30得票数 1

2回答

concurrent.futures问题:为什么只有一个工人？

python、python-3.x、concurrency、multiprocessing、concurrent.futures

我正在试验使用concurrent.futures.ProcessPoolExecutor并行化串行任务。串行任务涉及从数字范围内查找给定数字的出现次数。我的代码如下所示。在执行过程中，我从Task / System / top中注意到，尽管max_workers of processPoolExecutor的值超过1，但只有一个cpu/线程在运行。如何使用concurrent.futures?并行代码我的代码是用python3.5执行的。 import concurrent.futures as cf from time import time def _findmatch(nmax,

浏览 4提问于2017-02-05得票数 3

回答已采纳

1回答

如何加速处理图像中的大量补丁？

python、numpy、parallel-processing、computer-vision、image-preprocessing

我编写了一个处理图像的函数，在该函数中我提取了许多补丁，然后使用相同的函数(Func)处理它们以生成新的图像。但是，这是非常缓慢的，因为两个循环，func，补丁的数量，补丁的大小。我不知道该如何加速这段代码。功能如下所示。 # code1 def filter(img, func, ksize, strides=1): height,width = img.shape f_height,f_width = ksize new_height = height - f_height + 1 new_width = width - f_width + 1

浏览 0提问于2019-07-19得票数 1

扫码

添加站长进交流群

领取专属 10元无门槛券

手把手带您无忧上云

在Python中划分大型文件以进行多处理的最佳方法是什么？

相关·内容

UDP并行ThreadPoolExecutor

如何在多线程中使用tqdm？

有没有办法并行运行预加载的Keras模型的“预测”？

concurrent.futures不并行写入

SSH和Ping与Python异步并发主机？

Python:如何使用带有ProcessPoolExecutor的外部队列？

如何用异步并行化计算？

Python3.4 concurrent.futures.Executor不允许控件暂停并继续线程

使用ProcessPoolExecutor时增加python脚本的执行时间

当查询大型Server表时，pymssql/pyodbc性能(cursor.execute)非常慢

更快地处理具有put请求的for循环

N秒钟后终止python中的try块

在附加结果到字典的for循环上使用python多处理

Python3异步和GIL (如何使用所有的cpu核心-除了ProcessPoolExecutor之外的任何其他选项)？

如何获得currentThread().getName()作为(concurrent.futures)未来结果的一部分？

存储结果ThreadPoolExecutor

“`ProcessPoolExecutor`”在Ubuntu上工作，但在Windows 10上运行Python3.5.3的JUPYTE5.0.0笔记本时，使用“`BrokenProcessPool`”失败

python中的多进程，多进程运行相同的指令

concurrent.futures问题:为什么只有一个工人？

如何加速处理图像中的大量补丁？

扫码

相关资讯

热门标签

活动推荐

运营活动

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐