更新时间:2019-4-5
很巧的是编译安装tensorflow-gpu版成功了。 tensorflow已经更新到1.13版,官方的linux安装文件采用的是glibc2.23, 而centos只支持到glibc2.17,所以在使用pip install tensorflow-gpu安装后的使用过程中会报错:
ImportError: /lib64/libc.so.6: version `GLIBC_2.23' not found (required by /usr/local/python3.6/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow.so)
升级到glibc是不可能的,升级完系统都进不了了。只能重新源码编译tensorflow,这样就不会报错了。 下面是源码编译的过程,版本为最新版1.13:
这个过程不用多说,网上教程很多,我使用是cuda 10.0 cudnn 7.5.0 参考一下: https://www.jianshu.com/p/a201b91b3d96 note:一定要记住自己的cuda版本和cudnn版本,以及cuda的安装位置,因为后面用得到。
nccl是tensorflow gpu版必须的,现在版本2.4.2,下载地址:https://developer.nvidia.com/nccl/nccl-download
下载后应该是rpm文件,安装命令:rpm -ivh nccl-repo-rhel7-2.4.2-ga-cuda10.0-1-1.x86_64.rpm
这个很奇怪,并不会直接安装,而只是解压了一下,产生了3个rpm文件,用命令:rpm -qpl nccl-repo-rhel7-2.4.2-ga-cuda10.0-1-1.x86_64.rpm
,
可以看到文件位置:
到相应的文件夹下安装3个rpm文件,安装位置应该默认到/usr/lib64, 如果不确定可以用rpm -qpl xxx.rpm
查看安装位置。
note: 这里要记住nccl的版本和安装位置
bazel是google的编译工具,tensorflow就是用它编译的,所以必须安装。 下载链接:https://github.com/bazelbuild/bazel/releases 选在最新版下载:
下载后新建一个文件夹,文件名为bazel,并把该文件放到里面,解压命令:
unzip bazel-0.24.1-dist.zip
解压后编译:
./compile.sh
等待一段时间,就会提示成功,编译后二进制执行文件在: bazel/ouput 目录下, 在bashrc里添加PATH:
这里的目录一定要正确,之后:source ~/.bashrc 在命令行输入: bazel 出现下面就表示成功了:
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
开始编译配置:
./configure
注意:与cuda和nccl相关的选择Y,其他都选择no:
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
Do you wish to build TensorFlow with GDR support? [y/N]: N
Do you wish to build TensorFlow with VERBS support? [y/N]: N
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 10.0]:10.0
Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.5.0
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/local/cuda-10.0
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Please specify the NCCL version you want to use. [Leave empty to default to NCCL 2]: 2.4.2
Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/lib64
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1] 这个地方按回车就行,不用选,用其Default的就好
Do you want to use clang as CUDA compiler? [y/N]: N
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc
Do you wish to build TensorFlow with MPI support? [y/N]: N
Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]: -march=native
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N
使用编译命令编译:
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
等待结束就好,需要一定的时间,如果成功,则胜利了。
装换为whl文件:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
使用pip安装文件:
pip install /tmp/tensorflow_pkg/*.whl