Using GPUs（使用GPU）

支持的设备

在典型的系统中，有多个计算设备。在TensorFlow中，支持的设备类型是CPU和GPU。他们被表示为strings。例如：

"/cpu:0"：您的机器的CPU。

"/device:GPU:0"：你机器的GPU，如果你有的话。

"/device:GPU:1"：您机器的第二个GPU等

如果TensorFlow操作同时具有CPU和GPU，则在将操作分配给设备时，GPU设备将被赋予优先权。例如，同时matmul拥有CPU和GPU内核。在用设备的系统cpu:0和gpu:0，gpu:0将选择运行matmul。

记录设备的位置

要找出您的操作和张量分配给哪些设备，请创建log_device_placement配置选项设置为的会话True。

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

您应该看到以下输出：

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

手动设备放置

如果您希望特定的操作在您选择的设备上运行，而不是自动为您选择with tf.device的设备，则可以使用创建设备上下文以使该上下文中的所有操作都具有相同的设备分配。

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

现在，您将看到a并b分配给cpu:0。由于未明确指定MatMul操作的设备，因此TensorFlow运行时将根据操作和可用设备（gpu:0在此示例中）选择一个设备，并根据需要自动复制设备之间的张量。

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

允许GPU内存增长

默认情况下，TensorFlow将几乎所有GPU（归属于CUDA_VISIBLE_DEVICES）的所有GPU内存映射到进程。通过减少内存碎片（https://en.wikipedia.org/wiki/Fragmentation_（computing％29）），可以更有效地使用设备上相对宝贵的GPU内存资源。

在某些情况下，进程只需要分配可用内存的一个子集，或者仅根据进程需要增加内存使用量。TensorFlow在Session上提供了两个Config选项来控制这个选项。

第一种allow_growth选择是尝试仅基于运行时分配分配尽可能多的GPU内存：它开始分配很少的内存，并且随着Sessions运行并需要更多GPU内存，我们扩展TensorFlow所需的GPU内存区域处理。请注意，我们不释放内存，因为这会导致内存碎片变得更糟。要打开此选项，请通过以下方式在ConfigProto中设置选项：

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

第二种方法是per_process_gpu_memory_fraction选项，它决定了每个可见GPU应分配的总内存量的比例。例如，您可以通过以下方式告诉TensorFlow仅分配每个GPU的总内存的40％：

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

如果要真正限制TensorFlow进程可用的GPU内存量，这非常有用。

在多GPU系统上使用单个GPU

如果您的系统中有多个GPU，则默认情况下将选择具有最低ID的GPU。如果您想在不同的GPU上运行，则需要明确指定首选项：

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

如果您指定的设备不存在，您将得到InvalidArgumentError：

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/device:GPU:2'
   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
   values: 1 2 3...>, _device="/device:GPU:2"]()]]

如果您希望TensorFlow自动选择现有的且受支持的设备以在指定的设备不存在的情况下运行这些操作，则可以在创建会话时将其设置allow_soft_placement为True配置选项。

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with allow_soft_placement and log_device_placement set
# to True.
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Runs the op.
print(sess.run(c))

使用多个GPU

如果您想要在多个GPU上运行TensorFlow，则可以采用多塔式方式构建模型，其中每个塔都分配给不同的GPU。例如：

# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

您将看到以下输出。

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]

cifar10教程是演示如何使用多个GPU进行操练的一个很好的示例。

本文档系腾讯云开发者社区成员共同维护，如有问题请联系 cloudcommunity@tencent.com

最后更新于：2017-12-18