目前我看官网主要推荐docker 方式了,那我们就用docker 方式试试。而且网上的安装教程也是docker 的居多【官方给出了一个教程】,我们也要与时俱进。
下面是我机器wsl kernel的版本:可见是没有最新,只有更新哈!
season@season:~$ uname -r
5.10.16.3-microsoft-standard-WSL2
官方文档:
PyTorch MNIST 测试,这是一个有目的的小型玩具机器学习示例,它强调了保持 GPU 忙碌以达到满意的 WSL2性能的重要性。与原生 Linux 一样,工作负载越小,就越有可能由于启动 GPU 进程的开销而导致性能下降。这种退化在 WSL2上更为明显,并且与原生 Linux 的规模不同。
从图中可以看出如果batch size小的话,很多时间会消耗在CUDA调用上,batch size=8的时候,时间消耗会是native CUDA的138%。如果提高batch size,让CUDA充分忙碌,性能可以接近native!
以下是原文链接。
https://developer.nvidia.com/blog/leveling-up-cuda-performance-on-wsl2-with-new-enhancements/
cuda 驱动 on wsl
https://developer.nvidia.com/cuda/wsl/download
不知道为啥,这个没有特别说明和wsl 有啥关系。。。我已经有驱动了,这个不知道装了个啥。做如图所示的选择。
特别注意,在wsl-2 中安装 cuda toolkit 要使用如下脚本:
红框处是单独的选项
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.5.1/local_installers/cuda-repo-wsl-ubuntu-11-5-local_11.5.1-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-11-5-local_11.5.1-1_amd64.deb
sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-5-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
安装好以后进行测试:Black-Scholes模型,简称B-S模型,是一种对金融产品估价的数学模型。
cd /usr/local/cuda-11.5/samples/4_Finance/BlackScholes
season@season:/usr/local/cuda-11.5/samples/4_Finance/BlackScholes$ ll
total 60
drwxr-xr-x 4 root root 4096 Nov 24 23:07 ./
drwxr-xr-x 10 root root 4096 Nov 24 23:07 ../
drwxr-xr-x 2 root root 4096 Nov 24 23:07 .vscode/
-rw-r--r-- 1 root root 8382 Sep 21 01:38 BlackScholes.cu
-rw-r--r-- 1 root root 2787 Sep 21 01:38 BlackScholes_gold.cpp
-rw-r--r-- 1 root root 3646 Sep 21 01:38 BlackScholes_kernel.cuh
-rw-r--r-- 1 root root 13454 Sep 21 01:38 Makefile
-rw-r--r-- 1 root root 1859 Sep 21 01:38 NsightEclipse.xml
drwxr-xr-x 2 root root 4096 Nov 24 23:07 doc/
-rw-r--r-- 1 root root 189 Sep 21 01:38 readme.txt
season@season:/usr/local/cuda-11.5/samples/4_Finance/BlackScholes$ sudo make BlackScholes
>>> GCC Version is greater or equal to 5.1.0 <<<
/usr/local/cuda-11.5/bin/nvcc -ccbin g++ -I../../common/inc -m64 -maxrregcount=16 --threads 0 --std=c++11 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o BlackScholes.o -c BlackScholes.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
ptxas warning : For profile sm_86 adjusting per thread register count of 16 to lower bound of 24
ptxas warning : For profile sm_75 adjusting per thread register count of 16 to lower bound of 24
ptxas warning : For profile sm_70 adjusting per thread register count of 16 to lower bound of 24
ptxas warning : For profile sm_80 adjusting per thread register count of 16 to lower bound of 24
/usr/local/cuda-11.5/bin/nvcc -ccbin g++ -I../../common/inc -m64 -maxrregcount=16 --threads 0 --std=c++11 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o BlackScholes_gold.o -c BlackScholes_gold.cpp
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/cuda-11.5/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o BlackScholes BlackScholes.o BlackScholes_gold.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp BlackScholes ../../bin/x86_64/linux/release
season@season:/usr/local/cuda-11.5/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6
Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.
Executing Black-Scholes GPU kernel (512 iterations)...
Options count : 8000000
BlackScholesGPU() time : 0.261508 msec
Effective memory bandwidth: 305.918207 GB/s
Gigaoptions per second : 30.591821
BlackScholes, Throughput = 30.5918 GOptions/s, Time = 0.00026 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128
Reading back GPU results...
Checking the results...
...running CPU calculations.
Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05
Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.
[BlackScholes] - Test Summary
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Test passed
安装标准docker
curl https://get.docker.com | sh
之后输出:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 18617 100 18617 0 0 25294 0 --:--:-- --:--:-- --:--:-- 25294
# Executing docker install script, commit: 93d2499759296ac1f9c510605fef85052a2c32be
WSL DETECTED: We recommend using Docker Desktop for Windows.
Please get Docker Desktop from https://www.docker.com/products/docker-desktop
You may press Ctrl+C now to abort this script.
+ sleep 20
+ sudo -E sh -c apt-get update -qq >/dev/null
+ sudo -E sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null
+ sudo -E sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" | gpg --dearmor --yes -o /usr/share/keyrings/docker-archive-keyring.gpg
+ sudo -E sh -c echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu focal stable" > /etc/apt/sources.list.d/docker.list
+ sudo -E sh -c apt-get update -qq >/dev/null
+ sudo -E sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq --no-install-recommends docker-ce-cli docker-scan-plugin docker-ce >/dev/null
+ version_gte 20.10
+ [ -z ]
+ return 0
+ sudo -E sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq docker-ce-rootless-extras >/dev/null
================================================================================
To run Docker as a non-privileged user, consider setting up the
Docker daemon in rootless mode for your user:
dockerd-rootless-setuptool.sh install
Visit https://docs.docker.com/go/rootless/ to learn about rootless mode.
To run the Docker daemon as a fully privileged service, but granting non-root
users access, refer to https://docs.docker.com/go/daemon-access/
WARNING: Access to the remote API on a privileged Docker daemon is equivalent
to root access on the host. Refer to the 'Docker daemon attack surface'
documentation for details: https://docs.docker.com/go/attack-surface/
================================================================================
装完之后发现,这个突然出现的玩意也是挺占用内存的
安装 NVIDIA Container 以及 Toolkit
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
下面我们继续安装官网给出的测试docker 进行一下测试。
https://docs.nvidia.com/cuda/wsl-user-guide/index.html
season@season:~$ sudo service docker stop
[sudo] password for season:
* Docker already stopped - file /var/run/docker-ssd.pid not found.
season@season:~$ sudo service docker start
* Starting Docker: docker [ OK ]
season@season:~$ sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Unable to find image 'nvcr.io/nvidia/k8s/cuda-sample:nbody' locally
nbody: Pulling from nvidia/k8s/cuda-sample
d519e2592276: Pull complete
d22d2dfcfa9c: Pull complete
b3afe92c540b: Pull complete
b25f8d7adb24: Pull complete
ddb025f124b9: Pull complete
fe72fda9c19e: Pull complete
c6a265e4ffa3: Pull complete
c931a9542ebf: Pull complete
f7eb321dd245: Pull complete
d67fd954fbd5: Pull complete
Digest: sha256:a2117f5b8eb3012076448968fd1790c6b63975c6b094a8bd51411dee0c08440d
Status: Downloaded newer image for nvcr.io/nvidia/k8s/cuda-sample:nbody
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.6
> Compute 8.6 CUDA device: [NVIDIA GeForce RTX 3060 Laptop GPU]
30720 bodies, total time for 10 iterations: 25.952 ms
= 363.636 billion interactions per second
= 7272.727 single-precision GFLOP/s at 20 flops per interaction
season@season:~$
启动脚本:
docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
测试效果,感觉跑结果交互起来非常缓慢,下面这段程序大概跑了5-10分钟
import tensorflow as tf
version = tf.__version__
gpu_ok = tf.test.is_gpu_available()
WARNING:tensorflow:From <ipython-input-3-11c8f8f7a7c6>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
print("tf version:",version,"\nuse GPU",gpu_ok)
tf version: 2.1.0
use GPU True
tf.reduce_sum(tf.random.normal([1000, 1000]))
<tf.Tensor: shape=(), dtype=float32, numpy=1321.2869>
姑且把日志贴出来,大佬分析分析啥原因,会不是时间时区不一样?造成反映慢。。。
[I 16:29:45.604 NotebookApp] 302 GET /?token=99b4290f4d03e9fe537a7fe14872d20f2a50f8a23c81c247 (172.17.0.1) 0.55ms
[I 16:29:49.983 NotebookApp] Creating new notebook in
[I 16:29:50.024 NotebookApp] Writing notebook-signing key to /root/.local/share/jupyter/notebook_secret
[I 16:29:50.496 NotebookApp] Kernel started: 24d0d681-bfac-4723-9e73-541c5c4f2443
2021-11-25 16:30:05.794652: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2021-11-25 16:30:05.824203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2021-11-25 16:30:29.686851: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-11-25 16:30:29.700683: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2303995000 Hz
2021-11-25 16:30:29.702117: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x484f3f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-11-25 16:30:29.702147: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-11-25 16:30:29.704570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-11-25 16:30:30.076279: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:30:30.076510: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x48c1ac0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-11-25 16:30:30.076562: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3060 Laptop GPU, Compute Capability 8.6
2021-11-25 16:30:30.077105: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:30:30.077137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.702GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2021-11-25 16:30:30.077168: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-25 16:30:30.077183: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-11-25 16:30:30.111039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-11-25 16:30:30.116390: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-11-25 16:30:30.162423: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-11-25 16:30:30.167253: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-11-25 16:30:30.167342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-25 16:30:30.167733: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:30:30.167962: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:30:30.168006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-11-25 16:30:30.168473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[I 16:31:50.494 NotebookApp] Saving file at /Untitled.ipynb
2021-11-25 16:33:20.133916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-25 16:33:20.133958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2021-11-25 16:33:20.133980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2021-11-25 16:33:20.135139: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.135166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-11-25 16:33:20.135374: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.135442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 4788 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-11-25 16:33:20.152265: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.152312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.702GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2021-11-25 16:33:20.152357: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-25 16:33:20.152377: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-11-25 16:33:20.152406: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-11-25 16:33:20.152427: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-11-25 16:33:20.152446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-11-25 16:33:20.152484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-11-25 16:33:20.152507: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-25 16:33:20.152726: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.152927: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.152951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
[I 16:33:51.283 NotebookApp] Saving file at /Untitled.ipynb
2021-11-25 16:35:09.708118: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.708166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.702GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2021-11-25 16:35:09.708200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-25 16:35:09.708219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-11-25 16:35:09.708245: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-11-25 16:35:09.708253: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-11-25 16:35:09.708289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolveeerer.eeeeer.so.10
2021-11-25 16:35:09.708310: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-11-25 16:35:09.708317: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-25 16:35:09.708564: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.708926: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.708955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-11-25 16:35:09.709019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-25 16:35:09.709023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2021-11-25 16:35:09.709027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2021-11-25 16:35:09.709380: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.709410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-11-25 16:35:09.709641: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.709682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4788 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
安装WSL2,官方文档说的比较清楚了
5步搭建wsl2+cuda+docker解决windows深度学习开发问题
Windows+WSL2+CUDA+Docker
tensor flow 官方gpu 支持文档
cuda 官方指导
本文主要参考:CUDA on WSL User Guide