关于使用实验室服务器的GPU以及跑上TensorFlow代码

时间：2024-03-08 08:53:24

连接服务器

Windows - XShell XFtp SSH

通过SSH来连接实验室的服务器

使用SSH连接已经不陌生了 github和OS课设都经常使用
目前使用 192.168.7.169

使用工具 XShell 和 XFtp

使用XShell连接服务器以及操作，服务器每个节点上都安装了Ubuntu 16.04 LTS操作系统
使用XFtp管理文件

参考资料：
Xshell+Xftp SSH隧道代理
 Xshell通过SSH密钥、SSH代理连接Linux服务器详解

Mac OS - Terminal Cyberduck

因为实验室工位上的电脑是Mac 只能重新熟悉一波了

使用Terminal来建立SSH远程连接
使用Cyberduck来建立SFtp连接管理文件（考虑filezilla中）
参考资料：
Mac下如何用SSH连接远程Linux服务器(包括Cyberduck下载)
Mac下使用自带终端SSH功能

建立环境 - virtualenv

建立虚拟环境并安装包（也可以考虑anaconda）
建立环境：virtualenv xxx_py virtualenv -p python3 xxx_py
进入环境：source xxx_py/bin/activate
退出：deactivate
使用清华镜像

临时使用
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
设为默认
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

参考资料：
清华pypi 镜像使用帮助
 virtualenv介绍及基本使用
 Python开发必备神器之一：virtualenv
virtualenv-廖雪峰的官方网站

让TensorFlow代码跑在GPU上

GPU占用问题
TensorFlow可能会占用视线可见的所有GPU资源

查看gpu占用情况：gpustat

在python代码中加入：

os.environ[\'CUDA_VISIBLE_DEVICES\'] = \'0\' 
os.environ[\'CUDA_VISIBLE_DEVICES\'] = \'0,1\'

设置使用固定的gpu：

CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen 
CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible 
CUDA_VISIBLE_DEVICES=”0,1” Same as above, quotation marks are optional 
CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked

运行代码时

CUDA_VISIBLE_DEVICES=0 python3 main.py

TensorFlow自己提供的两种控制GPU资源的方法：

在运行过程中动态申请显存，需要多少就申请多少

config = tf.ConfigProto()  
config.gpu_options.allow_growth = True  
session = tf.Session(config=config)

限制GPU的使用率

gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.4)  
config=tf.ConfigProto(gpu_options=gpu_options)  
session = tf.Session(config=config)

TensorFlow代码
目前没有考虑在代码各个部分手动分配时GPU还是CPU
所以用 with tf.device(self.device): 把所有网络结构包了起来
然后用 config = tf.ConfigProto(gpu_options=gpu_options,allow_soft_placement=True) 让TensorFlow自己去分配了
参考资料：
tensorflow设置gpu及gpu显存使用
 TensorFlow 使用 GPU
tensorflow GPU小测试

