人们在选择存储系统的时候,往往会考虑到它的存储能力、容错冗余机制、负载均衡等等。Ceph是一个非常好的分布式存储系统,可以无限制的扩展你的集群,并且它是一个块存储、对象存储和文件系统统一的分布式文件系统。所以很多公司都会选择ceph作为存储系统。
那么我们在进行ceph快速搭建的时候往往会遇到很多问题,我这边把我所遇到的问题总结了一下,请大家多多指教!
安装过程
#ceph-deploy install admin-node node1
报错:
(1) [ceph_deploy][ERROR ]RuntimeError: Failed to execute command: yum -y install epel-release
解决方法:
进入/etc/yum.repos.d中删除epel.repo和epel-testing.repo
(2) [ceph_deploy][ERROR ]RuntimeError: NoSectionError: No section: 'ceph'
解决方法:
#yum -y remove ceph-release
(3) [admin-node][WARNIN] Anotherapp is currently holding the yum lock; waiting for it to exit...
解决方式:
#rm -f /var/run/yum.pid
(4) RuntimeError: Failed to executecommand: yum -y install ceph ceph-radosgw
解决方式:
# rpm -ivh libunwind-1.1-10.el7.x86_64.rpm
(5) [ceph_deploy][ERROR ]RuntimeError: Failed to execute command: yum -y install yum-plugin-priorities
解决方式:
进入/etc/yum.repos.d中删除epel.repo和epel-testing.repo
(6) TTY报错
[node1][DEBUG ] connection detected need for sudo
sudo: sorry, you must have a tty to run sudo
[node1][DEBUG ] connected to host: node1
[ceph_deploy][ERROR ] RuntimeError: remoteconnection got closed, ensure ``requiretty`` is disabled for node1
解决方法:
# visudo
#Defaults requiretty
Defaults:ceph !requiretty
(7) ** ERROR: error creating emptyobject store in /var/local/osd0: (13) Permission denied
[admin-node][WARNIN]
[admin-node][ERROR ] RuntimeError: command returned non-zero exitstatus: 1
[ceph_deploy][ERROR ] RuntimeError: Failedto execute command: /usr/sbin/ceph-disk -v activate --mark-init systemd --mount/var/local/osd0
解决方法:
在各个节点上给/var/local/osd1/和/var/local/osd1/添加权限
如下:
chmod 777 /var/local/osd0/
chmod 777 /var/local/osd0/*
chmod 777 /var/local/osd1/
chmod 777 /var/local/osd1/*
(8) 连接超时
[root@admin-node my-cluster]#ceph-deploy osd activatenode1:/var/local/osd1
(9) osd节点起不来
[root@admin-node my-cluster]# ceph -s
clusterbeca06a9-c4c3-443d-9d85-fbc6d620c173
health HEALTH_WARN
64 pgs degraded
64 pgs stuckunclean
64 pgs undersized
monmap e1: 1 mons at{admin-node=10.0.3.88:6789/0}
election epoch 3,quorum 0 admin-node
osdmap e7: 2 osds: 1 up,1 in
flags sortbitwise
pgmap v29: 64 pgs, 1pools, 0 bytes data, 0 objects
6644 MB used,44544 MB / 51188 MB avail
64active+undersized+degraded
[root@admin-node my-cluster]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.09760 root default
-2 0.04880 hostadmin-node
0 0.04880 osd.0 up 1.00000 1.00000
-3 0.04880 host node1
10.04880 osd.1 down 0 1.00000
解决办法:
1. 关闭防火墙
2. 检查ssh无密钥登录配置
3. 一般情况下,osd节点启动较慢,等待5分钟左右即可。