问Python并行处理以解压缩文件
EN

Stack Overflow用户

提问于 2017-04-10 09:14:06

回答 1查看 3.8K关注 0票数 1

我对python中的并行处理是个新手。我在下面有一段代码，它遍历所有目录并解压缩所有tar.gz文件。然而，这需要相当长的时间。

import tarfile
import gzip
import os

def unziptar(path):
    for root, dirs, files in os.walk(path):
        for i in files:
            fullpath = os.path.join(root, i)
            if i.endswith("tar.gz"):
                print 'extracting... {}'.format(fullpath)
                tar = tarfile.open(fullpath, 'r:gz')
                tar.extractall(root)
                tar.close()

path = 'C://path_to_folder'
unziptar(path)

print 'tar.gz extraction completed'

我已经阅读了一些关于多处理和joblib包的文章，但我仍然不清楚如何修改我的脚本使其并行运行。任何帮助都是非常感谢的。

编辑：@tdelaney

感谢你的帮助，令人惊讶的是，修改后的脚本需要两倍的时间来解压所有内容(与原始脚本的30分钟相比，只需要60分钟)！

我查看了任务管理器，发现虽然使用了多核，但CPU使用率很低。我不知道为什么会这样。

python-2.7

parallel-processing

unzip

os.walk

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-04-10 09:46:47

创建一个池来完成这项工作是相当容易的。只需将提取器拉出到单独的worker中。

import tarfile
import gzip
import os
import multiprocessing as mp

def unziptar(fullpath):
    """worker unzips one file"""
    print 'extracting... {}'.format(fullpath)
    tar = tarfile.open(fullpath, 'r:gz')
    tar.extractall(os.path.dirname(fullpath))
    tar.close()

def fanout_unziptar(path):
    """create pool to extract all"""
    my_files = []
    for root, dirs, files in os.walk(path):
        for i in files:
            if i.endswith("tar.gz"):
                my_files.append(os.path.join(root, i))

    pool = mp.Pool(min(mp.cpu_count(), len(my_files))) # number of workers
    pool.map(unziptar, my_files, chunksize=1)
    pool.close()


if __name__=="__main__":
    path = 'C://path_to_folder'
    fanout_unziptar(path)
    print 'tar.gz extraction has completed'

票数 5

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43313666

复制

相似问题

问Python并行处理以解压缩文件
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python并行处理以解压缩文件EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python并行处理以解压缩文件
EN