首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >无法插入'nvidia_352':没有这样的设备

无法插入'nvidia_352':没有这样的设备
EN

Stack Overflow用户
提问于 2015-09-10 13:35:22
回答 5查看 24.6K关注 0票数 16

我正在尝试在Linux Ubuntu上运行caffe。安装后,我在gpu中运行caffe,错误是

代码语言:javascript
复制
I0910 13:28:13.606891 10629 caffe.cpp:296] Use GPU with device ID 0
modprobe: ERROR: could not insert 'nvidia_352': No such device
F0910 13:28:13.728612 10629 common.cpp:142] Check failed: error == cudaSuccess (38 vs. 0)  no CUDA-capable device is detected
*** Check failure stack trace: ***
    @     0x7ffd3b9a7daa  (unknown)
    @     0x7ffd3b9a7ce4  (unknown)
    @     0x7ffd3b9a76e6  (unknown)
    @     0x7ffd3b9aa687  (unknown)
    @     0x7ffd3bf91cb5  caffe::Caffe::SetDevice()
    @           0x40a5a7  time()
    @           0x4080f8  main
    @     0x7ffd3aeb9ec5  (unknown)
    @           0x408618  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

我的NVIDIA驱动程序是352.41。我安装了352,它是安装的最新版本。

代码语言:javascript
复制
sudo apt-get install nvidia-352[sudo] 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-352 is already the newest version.
The following packages were automatically installed and are no longer required:
  account-plugin-windows-live libupstart1
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 31 not upgraded.

我的Ubuntu有NVIDIA驱动程序352,为什么我会有这样的错误

代码语言:javascript
复制
I0910 13:28:13.606891 10629 caffe.cpp:296] Use GPU with device ID 0
    modprobe: ERROR: could not insert 'nvidia_352': No such device
    F0910 13:28:13.728612 10629 common.cpp:142] Check failed: error == cudaSuccess (38 vs. 0)  no CUDA-capable device is detected

我检查了是否有支持CUDA的设备,如

代码语言:javascript
复制
lspci | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation GK107GL [Quadro K2000] (rev a1)
05:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)

我有支持CUDA的设备,为什么会出现错误?

编辑1:是的,我使用./deviceQuery的测试失败了。

代码语言:javascript
复制
../NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

我查看了dev/文件夹,我有nvidia0。

代码语言:javascript
复制
crwxrwxrwx  1 root root    195,   0 Sep 10 16:51 nvidia0
crw-rw-rw-  1 root root    195, 255 Sep 10 16:51 nvidiactl

我的nvcc -V检查给了我

代码语言:javascript
复制
li@li-HP-Z420-Workstation:/dev$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

然后我的版本检查

代码语言:javascript
复制
li@li-HP-Z420-Workstation:/dev$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.41  Fri Aug 21 23:09:52 PDT 2015
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) 

会出什么问题呢?

EN

回答 5

Stack Overflow用户

回答已采纳

发布于 2018-06-28 17:04:04

另一种方法是使用.run文件进行安装。这需要先杀死X服务器。X服务器被终止,如下所示。

代码语言:javascript
复制
Make sure you are logged out.
Hit CTRL+ALT+F1 and login using your credentials.
kill your current X server session by typing sudo service lightdm stop or sudo stop lightdm
Enter runlevel 3 (or 5) by typing sudo init 3 (or sudo init 5) and install your .run file.
You might be required to reboot when the installation finishes. If not, run sudo service start lightdm or sudo start lightdm to start your X server again.

然后是run .run file as sudo sh xxxxx.run

作为The distribution-provided pre-install script failed! Are you sure you want to continue?,您可能会收到错误消息。然后中止安装并

代码语言:javascript
复制
disable the "Nouveau kernel driver" as sudo update-initramfs -u

然后重新启动系统和redo stop X server, enter runlevel 3 and do sudo sh xxxx.run again.

这一次,您可以忽略该消息,并继续处理规定的失败消息。然后,您将能够从.run文件安装Nvidia驱动程序。

票数 0
EN

Stack Overflow用户

发布于 2015-09-14 11:24:42

现在问题解决了。我检查了我发现的sudo dpkg --list | grep nvidia,因为我的内核有352.41,但客户机有304.12。所以我做了sudo apt-get remove --purge nvidia-*。它删除了所有的包。然后,将352.41安装为

代码语言:javascript
复制
$ sudo add-apt-repository ppa:xorg-edgers/ppa -y
$ sudo apt-get update
$ sudo apt-get install nvidia-352

在那之后

代码语言:javascript
复制
$ sudo dpkg --list | grep nvidia
rc nvidia-304 304.128-0ubuntu0~gpu14.04.2 amd64 NVIDIA legacy binary driver - version 304.128
rc nvidia-304-updates 304.125-0ubuntu0.0.2 amd64 NVIDIA legacy binary driver - version 304.125
ii nvidia-352 352.41-0ubuntu0~gpu14.04.1 amd64 NVIDIA binary driver - version 352.41
rc nvidia-opencl-icd-304 304.128-0ubuntu0~gpu14.04.2 amd64 NVIDIA OpenCL ICD
rc nvidia-opencl-icd-304-updates 304.125-0ubuntu0.0.2 amd64 NVIDIA OpenCL ICD
ii nvidia-opencl-icd-352 352.41-0ubuntu0~gpu14.04.1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.6.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-settings 355.11-0ubuntu0~gpu14.04.1 amd64 Tool for configuring the NVIDIA graphics driver

现在版本匹配。然后./deviceQuery和所有这些都按预期工作。谢谢

票数 11
EN

Stack Overflow用户

发布于 2015-10-05 14:13:29

我也有这个问题。重新安装nvidia驱动程序并没有解决这个问题。

最后,我使用grub添加了两个内核参数,从而解决了这个问题。

加载项:

代码语言:javascript
复制
GRUB_CMDLINE_LINUX_DEFAULT

通过以下方式:

代码语言:javascript
复制
pci=nocrs pci=realloc

我认为这是cuda7.5kernel3.19之间的冲突。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/32493904

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档