参考 ubuntu16.04+gtx1060+cuda8.0+caffe安装、测试经历 ,细节处有差异。
首先说明,这是在台式机上的安装测试经历,首先安装的win10,然后安装ubuntu16.04双系统,显卡为GTX1060
台式机显示器接的是GTX1060 HDMI口,win10上首先安装了最新的GTX1060驱动375
废话不多说,上车吧,少年
一、首先安装nvidia显卡驱动
-
我是1080P的显示器,在没有安装显卡驱动前,ubuntu分辨率很低,可以手动修改一下grub文件,提高分辨率,在终端输入
sudo vim /etc/default/grub
找到以下行# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command 'vbeinfo'
# GRUB_GFXMODE=640×480
按a进入插入模式,增加下面一行GRUB_GFXMODE=1920×1080
#这里分辨率自行设置
按esc退出插入模式,按:wq保存退出
在终端编辑sudo update-grub
更新grub
重新启动ubuntu使之生效 -
进入ubuntu系统设置-软件与更新-附加驱动
安装之后重启系统让GTX1060显卡驱动生效 - 测试
终端输入
nvidia-smi
显示效果如下图表示安装成功
二、cuda安装
-
下载cuda_8.0.61_375.26_linux.run 和 cudnn-8.0-linux-x64-v5.1.tgz
这里我提供了百度网盘,这两个文件我先在win10下下载好,并用u盘拷贝到ubuntu的下载目录下
-
安装cuda8.0
终端输入
cd 下载/
sh cuda_8.0.27_linux.run --override
启动安装程序,一直按空格到最后,输入accept接受条款 (或者按 Q)
输入n不安装nvidia图像驱动,之前已经安装过了
输入y安装cuda 8.0工具
回车确认cuda默认安装路径:/usr/local/cuda-8.0
输入y用sudo权限运行安装,输入密码
输入y或者n安装或者不安装指向/usr/local/cuda的符号链接
输入y安装CUDA 8.0 Samples,以便后面测试
回车确认CUDA 8.0 Samples默认安装路径:/home/yt(yt是我的用户名),该安装路径测试完可以删除 -
安装cudnn v5.1
终端输入
cd 下载/
tar zxvf cudnn-8.0-linux-x64-v5..tgz解压在下载目录下产生一个cuda目录
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/ #复制头文件
sudo cp include/cudnn.h /usr/local/cuda/include/ #复制动态链接库 sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/lib64/libcudnn* #给所有用户增加这些文件的读权限 -
建立软链接
终端输入
cd /usr/local/cuda/lib64/
sudo rm -rf libcudnn.so libcudnn.so.
sudo ln -s libcudnn.so.5.1. libcudnn.so. #具体看版本
sudo ln -s libcudnn.so. libcudnn.so设置环境变量,终端输入
sudo gedit /etc/profile
在末尾加入
PATH=/usr/local/cuda/bin:$PATH
export PATH保存后,创建链接文件
sudo vim /etc/ld.so.conf.d/cuda.conf
按a进入插入模式,增加下面一行
/usr/local/cuda/lib64
按esc退出插入模式,按:wq保存退出
最后在终端输入sudo ldconfig
使链接生效 -
cuda Samples测试
打开CUDA 8.0 Samples默认安装路径,终端输入
cd /home/yt/NVIDIA_CUDA-8.0_Samples
(yt是我的用户名)sudo make all -j4
(4核)
出现“unsupported GNU version! gcc versions later than 5.3 are not supported!”
的错误,这是由于GCC版本过高,在终端输入cd /usr/local/cuda-8.0/include
sudo cp host_config.h host_config.h.bak
sudo gedit host_config.h
ctrl+f寻找有“5.3”的地方,只有一处,如下# if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 3)
#error -- unsupported GNU version! gcc versions later than 5.3 are not supported!
将两个5改成6,即#if __GNUC__ > 6 || (__GNUC__ == 6 && __GNUC_MINOR__ > 3)
保存退出,继续在终端输入cd /home/yt/NVIDIA_CUDA-8.0_Samples
(yt是我的用户名)sudo make all -j4
(4核)
完成后继续向终端输入cd bin/x86_64/linux/release
./deviceQuery
完成之后出现如下图所示,表示成功安装cuda
三、依赖包安装
sudo apt-get install build-essential
#必要的编译工具依赖sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
四
-
安装python的pip和easy_install,方便安装软件包
终端输入
cd
wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.py
sudo python ez_setup.py --insecure
wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
五
-
安装科学计算和python所需的部分库
终端输入
sudo apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran python-numpy
六
-
安装git,拉取源码
终端输入
sudo apt-get install git
git clone https://github.com/BVLC/caffe.git
七
-
安装python依赖
终端输入
sudo apt-get install python-pip
安装pip
cd /home/yt/caffe/pythonsudo su
for req in $(cat "requirements.txt"); do pip install -i https://pypi.tuna.tsinghua.edu.cn/simple $req; done
按Ctrl+D退出sudo su模式
八、编译caffe(暂不对matlab说明)
-
终端输入
cd /home/yt/caffe
cp Makefile.config.example Makefile.config
gedit Makefile.config
①将
USE_CUDNN := 1
取消注释,②
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
后面打上一个空格 然后添加/usr/include/hdf5/serial
如果没有这一句可能会报一个找不到hdf5.h的错误 终端输入
make all -j4
make过程中出现找不到lhdf5_hl和lhdf5的错误,
解决方案:
在计算机中搜索libhdf5_serial.so.10.1.0
,找到后右键点击打开项目位置
该目录下空白处右键点击在终端打开,打开新终端输入sudo ln libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so
最后在终端输入sudo ldconfig
使链接生效
原终端中输入make clean
清除第一次编译结果
再次输入make all -j4
重新编译-
终端输入
make test -j4
make runtest -j4
make pycaffe -j4
make distribute #生成发布安装包 测试python,终端输入
pip install protobuf -i https://pypi.tuna.tsinghua.edu.cn/simple pyspidercd /home/yt/caffe/python
python
import caffe
如果不报错就说明编译成功
九、mnist测试
下载mnist数据集,终端输入
cd /home/yt/caffe/data/mnist/
./get_mnist.sh
获取mnist数据集
在/home/yt/caffe/data/mnist/
目录下会多出训练集图片、训练集标签、测试集图片和测试集标签等4个文件mnist数据格式转换,终端输入
cd /home/yt/caffe/
./examples/mnist/create_mnist.sh
必须要在第一行之后运行第二行,即必须要在caffe根目录下运行create_mnist.sh
此时在/caffe/examples/mnist/
目录下生成mnist_test_lmdb和mnist_train_lmdb两个LMDB格式的训练集和测试集LeNet-5模型描述在
/caffe/examples/mnist/lenet_train_test.prototxt
Solver配置文件在
/caffe/examples/mnist/lenet_solver.prototxt
训练mnist,执行文件在
/caffe/examples/mnist/train_lenet.sh
终端输入cd /home/yt/caffe/
./examples/mnist/train_lenet.sh
测试结果如下
十、安装theano
1、直接输入命令:
sudo pip install theano
2、配置参数文件:.theanorc
sudo gedit ~/.theanorc
[global]
floatX=float32
device=gpu
base_compiledir=~/external/.theano/
allow_gc=False
warn_float64=warn
[mode]=FAST_RUN [nvcc]
fastmath=True [cuda]
root=/usr/local/cuda
3、运行测试例子:
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000 rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
十、安装tensosrflow
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout Branch # where Branch is the desired branch
git checkout r1.
sudo apt-get install python-numpy python-dev python-pip python-wheel
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
sudo apt-get install libcupti-dev
./configure
$ ./configure # 以下是一个例子
Please specify the location of python. [Default is /usr/bin/python]: y
Invalid python path. y cannot be found
Please specify the location of python. [Default is /usr/bin/python]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n] y
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] y
Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] n
No XLA JIT support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages] Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] n
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 6.1
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
........
INFO: All external dependencies fetched successfully.
Configuration finished
编译
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
检查tmp文件夹下生成的whl文件名
sudo pip install /tmp/tensorflow_pkg/ tensorflow-1.0.-cp27-cp27mu-linux_x86_64.whl
3、测试
python
import tensorflow as tf
sess = tf.Session()