我试图确定我能在CPU和GPU上运行多大的模型。使用下面的代码作为模板,我从一个小网络开始,并增加了参数直到失败。令我惊讶的是,第一次失败是由于下面的代码与CPU内存不足的错误有关。
我的GPU有12 of的RAM。我的CPU有128 of的RAM。为什么CPU会在GPU之前耗尽内存?如何让tensorflow与CPU一起使用更多的内存?
import time
import tensorflow as tf
import numpy      as np
from tensorflow import keras
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_train_scaled  = X_train/ 255
X_test_scaled   = X_test / 255
y_train_encoded = keras.utils.to_categorical(y_train, num_classes = 10, dtype = 'float32')
y_test_encoded  = keras.utils.to_categorical(y_test,  num_classes = 10, dtype = 'float32')
def get_model():
  model = keras.Sequential([
    keras.layers.Flatten(input_shape=(32,32,3)),
    keras.layers.Dense(20000, activation='relu'),
    keras.layers.Dense(20000, activation='relu'),
    keras.layers.Dense(10, activation='sigmoid')
  ])
  model.compile(optimizer='SGD',
    loss='categorical_crossentropy',
    metrics=['accuracy'])
  return model
with tf.device('/GPU:0'):
  model_gpu = get_model()
  t0 = time.time()
  model_gpu.fit(X_train_scaled, y_train_encoded, epochs = 1)
  t1 = time.time()
  print('GPU: ', t1 - t0)
with tf.device('/CPU:0'):
  model_cpu = get_model()
  t0 = time.time()
  model_cpu.fit(X_train_scaled, y_train_encoded, epochs = 1)
  t1 = time.time()
  print('CPU: ', t1 - t0)当我运行上面的代码时,我得到以下输出。
2022-03-15 02:04:41.968970: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.972553: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.972749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.973141: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-15 02:04:41.975086: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.975318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.975535: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:42.332615: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:42.332868: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:42.332901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1609] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2022-03-15 02:04:42.333147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:42.333209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9396 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:0b:00.0, compute capability: 8.6
2022-03-15 02:04:44.056461: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
1563/1563 [==============================] - 36s 23ms/step - loss: 1.7839 - accuracy: 0.3684
GPU:  36.7231342792511
2022-03-15 02:05:22.016145: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:22.016190: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 2147483648
2022-03-15 02:05:22.786090: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 1932735232 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:22.786147: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 1932735232
2022-03-15 02:05:23.549123: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 1739461632 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:23.549172: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 1739461632
2022-03-15 02:05:24.478277: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:24.478325: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 2147483648
2022-03-15 02:05:35.315236: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:35.315301: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 2147483648
2022-03-15 02:05:36.087838: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:36.087884: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 2147483648
2022-03-15 02:05:36.087902: W tensorflow/core/common_runtime/bfc_allocator.cc:462] Allocator (gpu_host_bfc) ran out of memory trying to allocate 1.49GiB (rounded to 1600000000)requested by op SGD/SGD/update_2/ResourceApplyGradientDescent
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2022-03-15 02:05:36.087925: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] BFCAllocator dump for gpu_host_bfc
2022-03-15 02:05:36.087942: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (256):  Total Chunks: 5, Chunks in use: 5. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 28B client-requested in use in bin.
2022-03-15 02:05:36.087949: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (512):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087954: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1024):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087959: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2048):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087964: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4096):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087968: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (8192):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087972: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (16384):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087977: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (32768):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087981: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (65536):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087985: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (131072):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087990: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (262144):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087995: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (524288):       Total Chunks: 1, Chunks in use: 1. 781.2KiB allocated for chunks. 781.2KiB in use in bin. 781.2KiB client-requested in use in bin.
2022-03-15 02:05:36.088001: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1048576):      Total Chunks: 1, Chunks in use: 0. 1.24MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088005: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2097152):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088009: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4194304):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088014: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (8388608):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088018: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (16777216):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088023: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (33554432):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088028: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (67108864):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088033: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (134217728):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088037: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (268435456):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088041: I tensorflow/core/common_runtime/bfc_allocator.cc:1033] Bin for 1.49GiB was 256.00MiB, Chunk State: 
2022-03-15 02:05:36.088046: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 2097152
2022-03-15 02:05:36.088052: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850000 of size 256 next 1
2022-03-15 02:05:36.088057: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850100 of size 256 next 2
2022-03-15 02:05:36.088060: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850200 of size 256 next 3
2022-03-15 02:05:36.088063: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850300 of size 256 next 4
2022-03-15 02:05:36.088067: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850400 of size 256 next 5
2022-03-15 02:05:36.088071: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850500 of size 800000 next 6
2022-03-15 02:05:36.088074: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free  at 303913a00 of size 1295872 next 18446744073709551615
2022-03-15 02:05:36.088079: I tensorflow/core/common_runtime/bfc_allocator.cc:1071]      Summary of in-use Chunks by size: 
2022-03-15 02:05:36.088083: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 5 Chunks of size 256 totalling 1.2KiB
2022-03-15 02:05:36.088087: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 800000 totalling 781.2KiB
2022-03-15 02:05:36.088091: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 782.5KiB
2022-03-15 02:05:36.088095: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocated_bytes_: 2097152 memory_limit_: 68719476736 available bytes: 68717379584 curr_region_allocation_bytes_: 2147483648
2022-03-15 02:05:36.088103: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats: 
Limit:                     68719476736
InUse:                          801280
MaxInUse:                       801280
NumAllocs:                        6261
MaxAllocSize:                   800000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0
2022-03-15 02:05:36.088110: W tensorflow/core/common_runtime/bfc_allocator.cc:474] ***************************************_____________________________________________________________
2022-03-15 02:05:36.088143: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at training_ops.cc:973 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[20000,20000] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator gpu_host_bfc发布于 2022-03-30 08:17:52
我认为把20000个神经元放在隐藏层是问题所在。你可以减少隐藏层中神经元的数量,以看到差异。我把128个和64个神经元放在CPU模式下来解决这个问题。
def get_model():
  model = keras.Sequential([
    keras.layers.Flatten(input_shape=(32,32,3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(10, activation='sigmoid')
  ])
  model.compile(optimizer='SGD',
    loss='categorical_crossentropy',
    metrics=['accuracy'])
  return modelhttps://stackoverflow.com/questions/71476285
复制相似问题