容器化应用: 在阿里云搭建多节点 Openshift 集群

时间:2020-12-14 03:03:01

概述

两节点集群做试验
计算节点要配置的东西比较少, 建议先配置计算节点, 然后切换到 Master 节点慢慢搞.
在阿里云的美国区(硅谷)开了两个ECS(按量)

  • Master: 2CPU, 16G内存, CentOS 7.4 64位
  • Node1: 1CPU, 8G内存, CentOS 7.4 64位

可是, 自定义镜像能在国内跨区复制, 但是总算借助米国的网络算是把整个流程跑通了, 国内的网络出国各种卡.

配置

配置计算节点和控制节点, 稍微有点区别, 如下

计算节点

# 设置主机名
hostnamectl set-hostname node1.example.com 
# 安装依赖包
yum install -y docker wget git net-tools bind-utils iptables-services bridge-utils bash-completion 
# 启用, 启动 Docker 服务
systemctl enable docker; systemctl start docker 
# 启用, 启动网络管理器
systemctl enable NetworkManager; systemctl start NetworkManager 
# 停止, 禁用防火墙
systemctl stop firewalld ; systemctl diable firewalld 
# Ansible和系统自带的urllib3有冲突, 卸载之: Error unpacking rpm package python-urllib3-1.10.2-3.el7.noarch
pip uninstall urllib3

Master 控制节点

# 设置主机名
hostnamectl set-hostname master.example.com

# 本地域名解析
echo "172.20.62.195 master.example.com" >> /etc/hosts
echo "172.20.62.196 node1.example.com" >> /etc/hosts

# 安装依赖包
yum install -y docker wget git net-tools bind-utils iptables-services bridge-utils bash-completion

# 启用, 启动 Docker 服务
systemctl enable docker; systemctl start docker

# 启用, 启动网络管理器
systemctl enable NetworkManager; systemctl start NetworkManager

# 停止, 禁用防火墙
systemctl stop firewalld ; systemctl diable firewalld

# Ansible和系统自带的urllib3有冲突, 卸载之: Error unpacking rpm package python-urllib3-1.10.2-3.el7.noarch
pip uninstall urllib3

# 安装, 启用, 启动ETCD分布式数据库
yum -y install etcd
systemctl enable etcd; systemctl start etcd

# 下载EPEL
yum -y install https://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-10.noarch.rpm

# enable=0
sed -i -e "s/^enabled=1/enabled=0/" /etc/yum.repos.d/epel.repo

# 安装
yum -y --enablerepo=epel install ansible pyOpenSSL

# 生成秘钥
ssh-keygen -f /root/.ssh/id_rsa -N ''

# 复制秘钥到集群中的所有节点, 实现无密码访问
for host in master.example.com node1.example.com; do ssh-copy-id -i ~/.ssh/id_rsa.pub $host;  done

# 下载 openshift-ansible
wget https://github.com/openshift/openshift-ansible/archive/openshift-ansible-3.7.0-0.126.0.tar.gz
tar zxvf openshift-ansible-3.7.0-0.126.0.tar.gz

# 备份
cp /etc/ansible/hosts /etc/ansible/hosts.bak

# 配置 /etc/ansible/hosts
# /etc/ansible/hosts 文件的内容修改为下面一个代码块
# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root
openshift_deployment_type=origin
openshift_release=3.6.0
# 如果CPU内存满足条件, 可以注释掉 openshift_disable_check
# Master 节点要求 2 CPU核心, 16G内存, 40G磁盘
# Node 节点要求 1 CPU核心, 8G内存, 20G磁盘
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability

# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
openshift_master_identity_providers=[{'name':'htpasswd_auth','login':'true','challenge':'true','kind':'HTPasswdPasswordIdentityProvider','filename':'/etc/origin/master/htpasswd'}]

# host group for masters
[masters]
master.example.com

# host group for nodes, includes region info
[nodes]
master.example.com
node1.example.com
node1.example.com openshift_node_labels="{'region': 'infra', 'zone': 'east'}"

[etcd]
master.example.com

开工, 坐等结果

ansible-playbook ~/openshift-ansible-openshift-ansible-3.7.0-0.126.0/playbooks/byo/config.yml

然后

如果有啥毛病, 把错误消息复制下来Google. 百度没有! 如果一切正常, 可以通过下面的一些命令查看集群的信息

查看节点列表

oc get nodes

我是谁

当前登录用户是WHO?

oc whoami

显示集群资源列表

oc get all -o wide

创建用户

htpasswd -b /etc/origin/master/htpasswd dev dev

以集群管理员登录

oc login -u system:admin

给DEV账号添加集群管理员角色

oc adm policy add-cluster-role-to-user cluster-admin dev

打洞

master.example.comnode1.example.com, 是通过本地 /etc/hosts 文件解析的, 无法通过公网访问. 要公网访问, 可以使用DNS.

在本机 /etc/hosts 添加如下一行:

127.0.0.1 master.example.com

执行如下命令打洞到远程Master

ssh -L 127.0.0.1:8443:master.example.com:8443 root@47.88.54.94

47.88.54.94 是真实的IP, 但是后面谁用就不知道了!!!

浏览器打开: https://master.example.com:8443

参考资料