CUDA® is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).
CUDA was developed with several design goals in mind:
Provide a small set of extensions to standard programming languages, like C, that enable a straightforward implementation of parallel algorithms. With CUDA C/C++, programmers can focus on the task of parallelization of the algorithms rather than spending time on their implementation.
Support heterogeneous computation where applications use both the CPU and GPU. Serial portions of applications are run on the CPU, and parallel portions are offloaded to the GPU. As such, CUDA can be incrementally applied to existing applications. The CPU and GPU are treated as separate devices that have their own memory spaces. This configuration also allows simultaneous computation on the CPU and GPU without contention for memory resources.
CUDA-capable GPUs have hundreds of cores that can collectively run thousands of computing threads. These cores have shared resources including a register file and a shared memory. The on-chip shared memory allows parallel tasks running on these cores to share data without sending it over the system memory bus.
从官网上面可以看到: 针对一些模型 cuDNN 专门做了优化,并且缩小了模型框架
cuDNN 8 的新功能 cuDNN 8 针对 A100 GPU 进行了优化,可提供高达 V100 GPU 5 倍的开箱即用性能,并且包含适用于对话式 AI 和计算机视觉等应用的新优化和 API。它已经过重新设计,可实现易用性和应用集成,同时还能为开发者提供更高的灵活性。 cuDNN 8 的亮点包括 已针对 NVIDIA A100 GPU 上的峰值性能进行调优,包括全新 TensorFloat-32、FP16 和 FP32 通过重新设计的低级别 API,可以直接访问 cuDNN 内核,从而实现更出色的控制和性能调优 向后兼容性层仍然支持 cuDNN 7.x,使开发者能够顺利过渡到新版 cuDNN 8 API 针对计算机视觉、语音和语言理解网络作出了新优化 已通过新 API 融合运算符,进而加速卷积神经网络 cuDNN 8 现以六个较小的库的形式提供,能够更精细地集成到应用中。开发者可以下载 cuDNN,也可从 NGC 上的框架容器中将其提取出来。 主要特性
使用cuDNN 的框架
windows 下面安装的主要问题是包版本的匹配问题,我们不要着急,核心思想是多去官网找。
必须在系统中安装以下 NVIDIA® 软件:
我点的win 11 版本,比较迷惑的是这个命名方式,说明了什么?我估计说明了windows11 和windows 10内核并没有什么不同。【windows11 升级了个寂寞。。。】
安装完成后:
PS C:\Users\season> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Sep_13_20:11:50_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.5, V11.5.50
Build cuda_11.5.r11.5/compiler.30411180_0
文档:
https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/
https://developer.nvidia.com/zh-cn/cudnn
https://docs.nvidia.com/deeplearning/cudnn/index.html
找到对应版本
https://developer.nvidia.com/zh-cn/cudnn
这一步我们可以不用配置,在每次使用的时候进行设置,或者参照网上的其他教程进行配置。
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\bin;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\extras\CUPTI\lib64;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\include;%PATH%
SET PATH=C:\cuDNN\cudnn-11.5-windows-x64-v8.3.0.98\cuda\bin;%PATH%
# 配置conda
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
conda config --show #查看conda的配置
# 配置 pip
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# 新建环境
conda create --name nlp_tf2 python=3.9
# 安装tensorflow-gpu
pip install tensorflow-gpu==2.6.2
装TensorFlow 时候推荐使用pip ,conda 的包可能不准确,所以这一步要用pip,当然我只是诱人的conda 方式没有尝试而已。
(nlp_tf2) C:\Users\season>pip install tensorflow==
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement tensorflow== (from versions:
2.5.0rc0, 2.5.0rc1, 2.5.0rc2, 2.5.0rc3, 2.5.0, 2.5.1, 2.5.2, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.7.0rc0, 2.7.0rc1, 2.7.0)
ERROR: No matching distribution found for tensorflow==
(nlp_tf2) C:\Users\season>pip install tensorflow-gpu==
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu== (from versions:
2.5.0, 2.5.1, 2.5.2, 2.6.0, 2.6.1, 2.6.2, 2.7.0rc0, 2.7.0rc1, 2.7.0)
ERROR: No matching distribution found for tensorflow-gpu==
https://tensorflow.google.cn/install/pip#windows_1
从官网这个下面的来看,python 3.9 应该安装 2.6 版本的 TensorFlow
cmd 命令行设置环境变量,这种 方式要求以后的程序跑之前都把这些加上,好处是可以使用多版本的cuda,不干扰我们的环境变量。
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\bin;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\extras\CUPTI\lib64;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\include;%PATH%
SET PATH=C:\cuDNN\cudnn-11.5-windows-x64-v8.3.0.98\cuda\bin;%PATH%
(nlp_tf2) C:\Users\season>python
Python 3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.reduce_sum(tf.random.normal([1000, 1000]))
2021-11-23 01:18:34.892308:
I tensorflow/core/platform/cpu_feature_guard.cc:142]
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-23 01:18:35.377735:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510]
Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3495 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
<tf.Tensor: shape=(), dtype=float32, numpy=520.1074>
>>> version = tf.__version__
>>> gpu_ok = tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-11-24 23:56:25.051249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 3272 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
>>> print("tf version:",version,"\nuse GPU",gpu_ok)
tf version: 2.6.2
use GPU True
安装WSL2,官方文档说的比较清楚了
5步搭建wsl2+cuda+docker解决windows深度学习开发问题
Windows+WSL2+CUDA+Docker
tensor flow 官方gpu 支持文档
cuda 官方指导