我使用python多处理模块并行运行一些长期运行的任务。我使用start()方法来运行作业,但是一旦作业返回,我想再次运行它们。
可以重用我创建的过程吗?还是每次要运行作业时都必须创建一个新的进程对象?
来自pyhton文档的这一节表明,我不能只使用start()方法一次,但也许有人知道重用实例的另一种方法:
启动进程的活动.
每个进程对象最多必须调用一次。它安排在单独的进程中调用对象的run()方法。
这是Process类的我的版本:
class Process(multiprocessing.Process):
def __init__(self, result_queue, MCMCinstance):
assert isinstance(MCMCinstance, MCMC)
multiprocessing.Process.__init__(self)
self.result_queue = result_queue
self.mcmc = MCMCinstance
self.interface = C_interface(self.mcmc)
self.burn_in = False
def run(self):
if self.burn_in: interface.burn_in()
self.interface.sample(self.mcmc.options.runs)
self.interface.update(self.mcmc)
self.result_queue.put(self.mcmc)
然后实例化进程并使用start()方法运行它们:
# setup the jobs and run
result_queue = multiprocessing.Queue()
mcmc1 = MCMC(options, donors, clusters)
mcmc2 = MCMC(options, donors, clusters)
mcmc3 = MCMC(options, donors, clusters)
mcmc4 = MCMC(options, donors, clusters)
p1 = Process(result_queue, mcmc1)
p2 = Process(result_queue, mcmc2)
p3 = Process(result_queue, mcmc3)
p4 = Process(result_queue, mcmc4)
jobs = [p1, p2, p3, p4]
for job in jobs:
job.start()
results = [result_queue.get() for job in jobs]
发布于 2014-05-14 09:56:04
正如文档所述,您只能调用一次.start()方法,我相信每次都必须创建新的进程:
# setup the jobs and run
result_queue = multiprocessing.Queue()
mcmc1 = MCMC(options, donors, clusters)
mcmc2 = MCMC(options, donors, clusters)
mcmc3 = MCMC(options, donors, clusters)
mcmc4 = MCMC(options, donors, clusters)
p1 = Process(result_queue, mcmc1)
p2 = Process(result_queue, mcmc2)
p3 = Process(result_queue, mcmc3)
p4 = Process(result_queue, mcmc4)
jobs = [p1, p2, p3, p4]
for job in jobs:
#job.debug_level = 1
job.start()
results = [result_queue.get() for job in jobs]
#for res in results: res.traceplot(show=False)
p5 = Process(result_queue, results[0])
p6 = Process(result_queue, results[1])
p7 = Process(result_queue, results[2])
p8 = Process(result_queue, results[3])
jobs2 = [p5, p6, p7, p8]
for j in jobs2:
j.start()
results2 = [result_queue.get() for job in jobs2]
发布于 2014-05-14 10:31:41
要重用流程,您应该使用一个池。虽然我还没有对它进行测试,但这样的东西可能会起作用。
SENTINEL = "SENTINEL"
class Worker(object):
def __init__(self, result_queue, MCMCinstance):
assert isinstance(MCMCinstance, MCMC)
self.result_queue = result_queue
self.mcmc = MCMCinstance
self.interface = C_interface(self.mcmc)
self.burn_in = False
def run(self):
if self.burn_in: interface.burn_in()
self.interface.sample(self.mcmc.options.runs)
self.interface.update(self.mcmc)
#Signal exit by putting SENTINEL in the queue
if True:
self.result_queue.put(SENTINEL)
else:
self.result_queue.put(self.mcmc)
def run(result_queue):
while True:
instance = result_queue.get(True)
if instance == SENTINEL:
break
worker = Worker(result_queue, instance)
worker.run()
if __name__ == "__main__":
result_queue = multiprocessing.Queue()
pool = multiprocessing.pool.Pool(3, run, (result_queue,)) # Use a pool with 5 process
mcmc1 = MCMC(options, donors, clusters)
mcmc2 = MCMC(options, donors, clusters)
mcmc3 = MCMC(options, donors, clusters)
mcmc4 = MCMC(options, donors, clusters)
result_queue.put(mcmc1)
result_queue.put(mcmc2)
result_queue.put(mcmc3)
result_queue.put(mcmc4)
pool.close()
pool.join()
发布于 2014-05-14 10:40:04
不这是不可能的。在start()
中有一个专门的防范措施。
我只能推测为什么它是不可重用的,但我认为这是一个设计选择。它可能会给类添加太多的逻辑来回收对象,这是值得的。但我认为更有趣的是,问一问为什么会这样。
不过,在阅读了过去20分钟的源代码之后,我可以肯定地说,仅仅创建整个python进程的一个分支比创建对象的新实例花费更多的时间,因此无论从性能上看,这都无关紧要。
至于您的代码,您可以压缩它一点,您不需要使用命名的Process
实例,并利用列表理解。
# setup the jobs and run
result_queue = multiprocessing.Queue()
mcmc_list = [MCMC(options, donors, clusters)]*4
jobs = [Process(result_queue, mcmc) for mcmc in mcmc_list ]
for job in jobs:
#job.debug_level = 1
job.start()
results = [result_queue.get() for job in jobs]
#for res in results: res.traceplot(show=False)
jobs2 = [Process(result_queue, result) for result in results]
for j in jobs2:
j.start()
results2 = [result_queue.get() for job in jobs2]
编辑:我还认为你滥用了Queue
--它是用于进程之间的通信,我认为这里不需要这个。要创建一个线程池,您应该使用Pool
和Pool.map
。然而,我无法给出确切的代码示例,没有看到原来的目标函数。我想需要调整一下。
https://stackoverflow.com/questions/23650576
复制相似问题