Tensorflow在GPU下的Poolallocator Message

sladesal

发布于 2018-08-27 11:03:24

5370

发布于 2018-08-27 11:03:24

文章被收录于专栏：机器学习之旅

我在在用GPU跑我一个深度模型的时候，发生了以下的问题：

...
2018-06-27 18:09:11.701458: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 63521 get requests, put_count=63521 evicted_count=1000 eviction_rate=0.0157428 and unsatisfied allocation rate=0.0173171
2018-06-27 18:09:11.701503: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
Global_step 2000        Train_loss: 0.0758
Global_step 3000        Train_loss: 0.0618
Global_step 4000        Train_loss: 0.0564
Global_step 5000        Train_loss: 0.0521
Global_step 6000        Train_loss: 0.0492
Global_step 7000        Train_loss: 0.0468
Global_step 8000        Train_loss: 0.0443
Global_step 9000        Train_loss: 0.0422
Global_step 10000       Train_loss: 0.0410
Global_step 11000       Train_loss: 0.0397
Global_step 12000       Train_loss: 0.0383
2018-06-27 18:13:59.743133: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 71532 get requests, put_count=71532 evicted_count=1000 eviction_rate=0.0139798 and unsatisfied allocation rate=0.0143013
2018-06-27 18:13:59.743167: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
...

除了常规的loss数据之外，我看到穿插在之间的warming informations ，虽然最后的结果没有任何问题，但是我抱着好奇的心态在stackoverflow找到了原因：

TensorFlow has multiple memory allocators, for memory that will be used in different ways. Their behavior has some adaptive aspects. In your particular case, since you're using a GPU, there is a PoolAllocator for CPU memory that is pre-registered with the GPU for fast DMA. A tensor that is expected to be transferred from CPU to GPU, e.g., will be allocated from this pool. The PoolAllocators attempt to amortize the cost of calling a more expensive underlying allocator by keeping around a pool of allocated then freed chunks that are eligible for immediate reuse. Their default behavior is to grow slowly until the eviction rate drops below some constant. (The eviction rate is the proportion of free calls where we return an unused chunk from the pool to the underlying pool in order not to exceed the size limit.) In the log messages above, you see "Raising pool_size_limit_" lines that show the pool size growing. Assuming that your program actually has a steady state behavior with a maximum size collection of chunks it needs, the pool will grow to accommodate it, and then grow no more. It behaves this way rather than simply retaining all chunks ever allocated so that sizes needed only rarely, or only during program startup, are less likely to be retained in the pool. These messages should only be a cause for concern if you run out of memory. In such a case the log messages may help diagnose the problem. Note also that peak execution speed may only be attained after the memory pools have grown to the proper size.

加粗部分解释机制、处理方式和原因。总结起来就是，PoolAllocator会有一个内存分配机制，GPU和CPU之间不是独立的可以相互传输，如果你使用的空间太多，他就会提高原有的预设的空间大小，如果够用了，就没有什么影响了，但是，需要注意的是，兄弟你的数据加载量太大了，看看是不是改改batch size，一次性少加载点数据，或者干掉隔壁同事的任务。

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2018.06.28 ，如有侵权请联系 cloudcommunity@tencent.com 删除

其他

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

其他

登录后参与评论

0 条评论

热度

Tensorflow在GPU下的Poolallocator Message

Tensorflow在GPU下的Poolallocator Message

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐