deepin 15.11 安装 tensorflow or pytorch GPU 版本

笔者环境

主机: 台式机
CPU: amd 3600
GPU: GTX 2060

不同电脑(尤指双显卡的笔记本)的显卡驱动安装方式可能不太一样,但安装显卡后后续步骤应该通用。
本文仅供参考~~

安装成功后的最终版本如下,仅供参考:

nvidia 驱动:  430.50
tensorflow: 2.0
Cuda: 10.1
cuDNN: 7.6.4

参考连接:

deepin15.8+NVIDIA_390.87+cuda9.0+cudnn7.4+tensorflow-gpu_1.9配置血泪史
deepin 15.10.2 安装 Python3.6.9
deepin 15.10.2 安装 Jupyter-notebook

安装显卡驱动

此处有参考这里 Deepin 下安装 Nvidia 驱动

下载驱动

https://www.geforce.cn/drivers 找到合适的驱动并下载,
下载完好放到在主目录(NVIDIA-Linux-x86_64-430.50.run)

禁用nouveau驱动

# 先安装一个pluma编辑器,或者你可以手动进目录去编辑
sudo apt-get install pluma
sudo pluma /etc/modprobe.d/blacklist.conf
 
## 或者通过文件夹右键管理员打开,然后手动打开对应的文件(可能需要新建blacklist.conf)
## 然后在文件中写入内容如下---:
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

接下来需要把刚才更改的这个生效

sudo update-initramfs -u

重启系统,再次进入系统

安装显卡驱动

关闭用户操作界面

sudo service lightdm stop

命令行模式下输入账号密码登录后,需要进入字符命令模式

sudo init 3

给与目标nvidia驱动可执行权限--注意路径一定要正确

chmod 777 ./NVI.............run

安装显卡驱动, 这里需要注意的是,安装过程中会出现很多弹框提示,如果懂的话,按照步骤操作即可,如果不懂的话,一路选择 YES 即可

sudo ./NVI.............run

不出意外的话,这里是能够安装成功的。如果失败的话也没关系,继续开启下面的用户界面,再寻找其他教程安装显卡驱动吧。显卡驱动下面的步骤依然适用 :)

开启用户界面

sudo service lightdm start

判断显卡驱动是否安装成功

第一种:安装成功之后,系统分辨率应该是变成你显示器支持的最大分辨率的。
第二种:命令行输入 nvidia-smi, 出现以下类似界面

jansora@jansora-PC:~$ nvidia-smi 
Fri Oct 18 15:37:06 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:08:00.0  On |                  N/A |
| 34%   33C    P8    21W / 165W |     91MiB /  5931MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4279      G   /usr/lib/xorg/Xorg                            60MiB |
|    0      4733      G   kwin_x11                                      17MiB |
+-----------------------------------------------------------------------------+

安装 cuda 10.1

请确保你的显卡驱动支持 cuda10.1 (CUDA 10.1 requires 418.x or higher.)

下载 cuda 10.1

wget https://developer.nvidia.com/cuda-10.1-download-archive-base?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

赋予执行权限

chmod 755 cuda_10.1.243_418.87.00_linux.run

开始安装 cuda10.1

deepin 15.11 安装cuda10.1不能使用sudo 执行root权限来安装, 否则会抛出 跟 /var/log/nvidia/.uninstallManifests 相关的 error,这里不过多赘述该原因了,
可以通过安装到用户目录下后,再移动到/usr/local方式来绕过这个error,详情请看以下步骤

创建安装到的文件夹

cd ~
mkdir cuda-10.1

执行安装文件, 安装到 ~/cuda-10.1 目录下.

执行后会有阅读指南,按 [[q]] 跳过指南. 输入 [[accept]] 开始安装

 ./cuda_10.1.243_418.87.00_linux.run  --toolkitpath=$HOME/cuda-10.1 --defaultroot=$HOME/cuda-10.1

选择 CUDA Toolkit 10.1 即可,其他都去掉 [[X]] 号

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer                                                               │
│ - [ ] Driver                                                                 │
[ ] 418.87.00                                                           │
│ + [X] CUDA Toolkit 10.1[ ] CUDA Samples 10.1[ ] CUDA Demo Suite 10.1[ ] CUDA Documentation 10.1│   Options                                                                    │
│   Install                                                                    │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

不出意外的话,这里是能够安装成功的

移动到 /usr/local 下

 sudo mv cuda-10.1 /usr/local/

配置软连接

 sudo ln -sv /usr/local/cuda-10.1/ /usr/local/cuda

配置Cuda环境变量

配置到 ~/.bashrc/etc/profile 都可以, 建议配置到 /etc/profile
sudo vim /etc/profile , 加入以下内容

export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH

使环境变量配置生效

source /etc/profile

检测安装是否成功

nvcc -V

出现以下类似信息,即安装成功。

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

安装cuDNN 7.6

下载 cuDNN 7.6

需要登陆账号才能下载,选择使用QQ登陆就好了

下载地址 https://developer.nvidia.com/rdp/cudnn-download
选择 cuDNN Library for Linux 下载即可,如图所示

解压

tar xvf cudnn-*.tgz 

拷贝文件

cd cuda
sudo cp include/* /usr/local/cuda/include/ 
sudo cp lib64/libcudnn.so.7.6.4 lib64/libcudnn_static.a /usr/local/cuda/lib64/ 
cd /usr/lib/x86_64-linux-gnu 
sudo ln -s libcudnn.so.7.6.4 libcudnn.so.7
sudo ln -s libcudnn.so.7 libcudnn.so

配置环境变量

配置到 ~/.bashrc/etc/profile 都可以
sudo vim /etc/profile , 加入以下内容

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
export PATH="$CUDA_HOME/bin:$PATH"

安装 NCCL 2.4.8

下载 NCCL 2.4.8

https://developer.nvidia.com/nccl/nccl-download

安装

tar xvf nccl_2.4.8-1+cuda10.1_x86_64.txz
cd nccl_2.4.8-1+cuda10.1_x86_64
sudo mkdir -p /usr/local/cuda/nccl/lib /usr/local/cuda/nccl/include 
sudo cp *.txt /usr/local/cuda/nccl 
sudo cp include/*.h /usr/include/ 
sudo cp lib/libnccl.so.2.4.8 lib/libnccl_static.a /usr/lib/x86_64-linux-gnu/ 
sudo ln -s /usr/include/nccl.h /usr/local/cuda/nccl/include/nccl.h 
cd /usr/lib/x86_64-linux-gnu 
sudo ln -s libnccl.so.2.4.8 libnccl.so.2 
sudo ln -s libnccl.so.2 libnccl.so 
for i in libnccl*; do sudo ln -s /usr/lib/x86_64-linux-gnu/$i /usr/local/cuda/nccl/lib/$i; done

如果不需要手动编译 tensorflow, JDK, Babel无需安装


安装JDK8

sudo apt install openjdk-8-jdk

安装babel 0.26.1

babel 版本不能高于 0.26.1,否则会提示

Please downgrade your bazel installation to version 0.26.1 or lower to build TensorFlow! To downgrade: download the installer for the old version (from https://github.com/bazelbuild/bazel/releases) then run the installer.

下载 babel 0.26.1

https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-installer-linux-x86_64.sh

安装 babel 0.26.1

bazel 安装的时候不能放在中文文件夹下

sudo chmod 755 ./bazel-0.26.1-installer-linux-x86_64.sh 
 ./bazel-0.26.1-installer-linux-x86_64.sh --user

配置环境变量

  1. 编辑脚本 sudo vim ~/.bashrc
  2. 追加以下内容:
export PATH="$PATH:$HOME/bin" #放在文件末尾
  1. 使配置生效 source ~/.bashrc

检测babel 安装成功

bazel version

出现以下内容就算成功

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
Build label: 0.26.1
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Jun 6 11:05:05 2019 (1559819105)
Build timestamp: 1559819105
Build timestamp as int: 1559819105

编译安装 tensorflow 2.0

不建议手动编译pip包, 因为国内的网络问题, download github 文件时基本会失败

下载 tensorflow 2.0

https://github.com/tensorflow/tensorflow/archive/r2.0.zip

解压

你可能还需要安装解压 zip 文件的软件, 执行该命令安装 sudo apt install unzip

unzip  tensorflow-r2.0.zip
cd tensorflow-r2.0

configure

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.26.1 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/local/bin/python3

Found possible Python library paths:
  /usr/local/lib/python3.8/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.8/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: 
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.

Found CUDA 10.1 in:
    /usr/local/cuda/lib64
    /usr/local/cuda/include
Found cuDNN 7 in:
    /usr/local/cuda/lib64
    /usr/local/cuda/include


Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 7.5


Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
        --config=gdr            # Build with GDR support.
        --config=verbs          # Build with libverbs support.
        --config=ngraph         # Build with Intel nGraph support.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=noignite       # Disable Apache Ignite support.
        --config=nokafka        # Disable Apache Kafka support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished

手动编译 pip 包

bazel build --config=opt --config=cuda --config=v2 //tensorflow/tools/pip_package:build_pip_package

pip 安装 tensorflow-gpu

截止本文发表日期时, tensorflow2.0 尚不支持GPU版本
pip3 instal tensorflow-gpu

pip 安装 pytorch

pip3 install torch torchvision

GPU版本tensorflow pytorch 安装完毕

评论栏