深度学习GPU卡性能比拼：见证Titan RTX“钞能力”

GPUS Lady

发布于 2019-03-07 11:15:06

5K0

国外一个技术Blog发布了

Titan RTX TensorFlow Benchmarks

文中，作者测试了包含Titan RTX在内的多个常见NVIDIA GPU卡在各种AI训练任务上的速度。对于每个GPU，分别训练下列神经网络时测量每秒处理的图像数量:ResNet50、ResNet152、Inception3、Inception4、VGG16、AlexNet和SSD。得出了一下结论：

那么到底谁是2019年用于深度学习最好的GPU卡呢？作者是这样说的：

RTX 2080 Ti is the best GPU for Machine Learning / Deep Learning if... 11 GB of GPU memory is sufficient for your training needs (for many people, it is). The 2080 Ti offers the best price/performance among the Titan RTX, Tesla V100, Titan V, GTX 1080 Ti, and Titan Xp.
Titan RTX is the best GPU for Machine Learning / Deep Learning if... 11 GB of memory isn't sufficient for your training needs. However, before concluding this, try training at half-precision (16-bit). This effectively doubles your GPU memory at the cost of training accuracy. If you're already successfully training at FP16 and 11 GB still isn't enough, then choose the Titan RTX -- otherwise, go with the RTX 2080 Ti. At half-precision, the Titan RTX offers effectively 48 GB of GPU memory.
Tesla V100 is the best GPU for Machine Learning / Deep Learning if... price isn't important, you need every bit of GPU memory available, or time to market of your product is of utmost important.

也就是说：11GB显存是一个很重要的考量

如果11GB显存够你折腾，那么RTX2080ti是目前性价比最好的GPU卡。
如果11GB显存不够你折腾——说这话之前，作者希望你试试半精度（16位）训练，可以有效地使您的GPU内存加倍，但代价是训练的准确性。如果您已经在FP16和11GB上成功地进行了训练，但仍然不够，那么选择Titan RTX。
土豪请随意使用Tesla V100

不过Lady我在之前的文章里已经个别介绍过这几款GPU卡的特性，有一些是需要各位看官注意的地方：

1. 网上有人测试过说2080Ti的Tensor Core,在FP16计算的时候，如果最后是累加FP32的话，只有一半性能。纯FP16计算2080Ti无此问题。而纯FP16，和FP16/FP32混合精度，Titan RTX都没这个问题。

2. 搭配NVLINK桥接器，两片Titan RTX之间传输性能要更优于两片2080ti

3. 2080ti不支持P2P access。Titan RTX没测试过，还不知道。

回到本文：

为了表达自己测试的严谨性，作者介绍了测试方法：

所有模型都在一个合成数据集上进行了训练，以将GPU性能与CPU预处理性能隔离开来，并减少虚假的I/O瓶颈。对每个GPU/model对进行10次训练实验，取平均值。GPU的“规范化训练性能”是通过将其在特定模型上的图像/秒性能除以同一模型上1080 Ti的图像/秒性能来计算的。

Titan RTX、2080 Ti、Titan V和V100基准测试使用张量核。

硬件是2x Titan RTX Desktop Computer with Intel Core i9-7920X + 64 GB of RAM. 他们只是简单地更换GPUs.

Batch尺寸：