前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >cuDNN installation

cuDNN installation

原创
作者头像
vanguard
修改2021-08-30 18:08:18
9900
修改2021-08-30 18:08:18
举报
文章被收录于专栏:vanguardvanguard

NVIDIA cuDNNis a GPU-accelerated library of primitives for deep neural networks.

  1. 硬件准备(电源+主板+处理器+风扇+内存+外存/NVMESSD/HDD+Nvidia显卡)
  2. 操作系统和工具安装(Ubuntu20.04+update+net-tools+ssh+vim+python3-pip+samba+git+xrdp+virtualenv)
  3. 显卡驱动和英伟达软件安装(Driver+CUDA+cuDNN+TensorRT)
    1. Driver https://www.nvidia.com/Download/index.aspx
    2. CUDA https://developer.nvidia.com/cuda-downloads
    3. cuDNN https://developer.nvidia.com/rdp/cudnn-download
    4. TensorRT https://developer.nvidia.com/zh-cn/tensorrt
  4. 依赖软件和框架安装(tensorflow-gpu+pytorch+opencv-python+yolo...)
  5. 容器化或直接训练模型和推理(docker+nvidia-docker...)

cuDNN的安装过程(目前需要登陆获取此链接)

代码语言:shell
复制
wget https://developer.download.nvidia.cn/compute/machine-learning/cudnn/secure/8.2.2/11.4_07062021/cudnn-11.4-linux-x64-v8.2.2.26.tgz?zVO0xngn9RHkR6idYHi7_WjTxJhRatqOB0Tsrbzn-y1zIokHbv0PQO_U8XLu7aMydM33JWOczvkirvAZ9BNN-aqsIyCpxg5Vc_sbF6AF8K6lGSXQ-CZXUe6IBt-5mcsMERGmkvQACeYRwKLqk7xy76mzV9epqp5_EgFkNFt7RcvA0T97ozdTs6e63yabuR5LkFx-de-Oa6IPbuU
tar xvf *
sudo cp -a include/cudnn.h /usr/local/cuda/include/
sudo cp -a lib64/libcudnn* /usr/local/cuda/lib64/
# nvidia-smi
# nvcc -V

难点还是CUDA的安装

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#removing-cuda-tk-and-driver

代码语言:shell
复制
# To remove CUDA Toolkit:
sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*" \
 "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*" 
# To remove NVIDIA Drivers:
sudo apt-get --purge remove "*nvidia*"
# To clean up the uninstall:
sudo apt-get autoremove

驱动尽量单独安装,因为有些不依赖CUDA但依赖驱动特别是要替换原生驱动的话,安装好后设置环境变量

代码语言:shell
复制
export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64\
                                 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
source ~/.bash

如果不安装cuDNN,可能跳过GPU的使用:

代码语言:shell
复制
2021-08-26 19:55:22.789937: W 
tensorflow/stream_executor/platform/default/dso_loader.cc:64] 
Could not load dynamic library 'libcudnn.so.8'; 
dlerror: libcudnn.so.8: 
cannot open shared object file: No such file or directory; 
LD_LIBRARY_PATH: /usr/local/cuda-11.4/lib64

2021-08-26 19:55:22.790001: W 
tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. 
Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. 
Follow the guide at 
https://www.tensorflow.org/install/gpu 
for how to download and setup the required libraries for your platform.

Skipping registering GPU devices...

2021-08-26 19:55:22.790631: I 
tensorflow/core/platform/cpu_feature_guard.cc:142] 
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) 
to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2021-08-26 19:55:23.528475: 
I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] 
None of the MLIR Optimization Passes are enabled (registered 2)

安装cuDNN后,则可使用,也可通过nvidia-smi观察显存等的使用情况

代码语言:shell
复制
2021-08-30 16:57:03.457415: I 
tensorflow/core/platform/cpu_feature_guard.cc:142] 
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) 
to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2021-08-30 16:57:05.198665: I 
tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] 
Created device /job:localhost/replica:0/task:0/device:GPU:0 with 17540 MB memory:  
-> device: 0, name: NVIDIA GeForce RTX 3090, 
pci bus id: 0000:02:00.0, compute capability: 8.6

2021-08-30 16:57:06.848155: I 
tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] 
None of the MLIR Optimization Passes are enabled (registered 2)

Epoch 1/5
2021-08-30 16:57:10.171347: I 
tensorflow/stream_executor/cuda/cuda_blas.cc:1760] 
TensorFloat-32 will be used for the matrix multiplication. 
This will only be logged once.
代码语言:shell
复制
Mon Aug 30 17:17:31 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 35%   50C    P2   109W / 350W |  23055MiB / 24265MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:82:00.0 Off |                  N/A |
| 34%   44C    P0   110W / 350W |      0MiB / 24268MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      5741      C   python                          23053MiB |
+-----------------------------------------------------------------------------+

cuDNN

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
容器镜像服务
容器镜像服务(Tencent Container Registry,TCR)为您提供安全独享、高性能的容器镜像托管分发服务。您可同时在全球多个地域创建独享实例,以实现容器镜像的就近拉取,降低拉取时间,节约带宽成本。TCR 提供细颗粒度的权限管理及访问控制,保障您的数据安全。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档