我正在学习python多处理,并尝试使用此功能来填充一个列表,其中包含操作系统中存在的所有文件。然而,我写的代码只是按顺序执行的。
#!/usr/bin/python
import os
import multiprocessing
tld = [os.path.join("/", f) for f in os.walk("/").next()[1]] #Gets a top level directory names inside "/"
manager = multiprocessing.Manager()
files = manager.list()
def get_files(x):
for root, dir, file in os.walk(x):
for name in file:
files.append(os.path.join(root, name))
mp = [multiprocessing.Process(target=get_files, args=(tld[x],))
for x in range(len(tld))]
for i in mp:
i.start()
i.join()
print len(files)当我检查进程树时,我只能看到产生了一个单独的chile进程。(man pstree表示{}表示父进程派生的子进程。)
---bash(10949)---python(12729)-+-python(12730)---{python}(12752)
`-python(12750)`我正在寻找的是,为每个tld目录生成一个进程,填充共享列表files,根据目录的数量,大约有10-15个进程。我做错了什么?
编辑::
我使用multiprocessing.Pool创建工作线程,这一次进程被派生,但是当我尝试使用multiprocessing.Pool.map()时,它给出了错误。我指的是python文档中的以下代码,它显示了
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3])) 按照这个示例,我将代码重写为
import os
import multiprocessing
tld = [os.path.join("/", f) for f in os.walk("/").next()[1]]
manager = multiprocessing.Manager()
pool = multiprocessing.Pool(processes=len(tld))
print pool
files = manager.list()
def get_files(x):
for root, dir, file in os.walk(x):
for name in file:
files.append(os.path.join(root, name))
pool.map(get_files, [x for x in tld])
pool.close()
pool.join()
print len(files)而且它正在派生多个进程。
---bash(10949)---python(12890)-+-python(12967)
|-python(12968)
|-python(12970)
|-python(12971)
|-python(12972)
---snip---但是代码错误地说
Process PoolWorker-2: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run task = get() File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get return recv() AttributeError: 'module' object has no attribute 'get_files' self._target(*self._args, **self._kwargs) self.run() task = get() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run task = get() File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get AttributeError: 'module' object has no attribute 'get_files' self.run()
我在这里做错了什么,为什么get_files()函数出错?
发布于 2015-10-08 18:54:06
这很简单,因为您在定义函数get_files之前实例化了池:
import os
import multiprocessing
tld = [os.path.join("/", f) for f in os.walk("/").next()[1]]
manager = multiprocessing.Manager()
files = manager.list()
def get_files(x):
for root, dir, file in os.walk(x):
for name in file:
files.append(os.path.join(root, name))
pool = multiprocessing.Pool(processes=len(tld)) # Instantiate the pool here
pool.map(get_files, [x for x in tld])
pool.close()
pool.join()
print len(files)进程的总体思想是,在您启动它的瞬间,您就派生了主进程的内存。因此,在之后,在主进程中完成的任何定义都不会在子进程中。
如果您想要共享内存,可以使用threading库,但它会有一些问题(cf:The global interpreter lock)
https://stackoverflow.com/questions/33012638
复制相似问题