postgresql 主从复制并切换

时间:2022-09-21 12:37:41

1 环境

192.168.19.145 Kylin 3.3 mysqlhq  9.5.2  psql_master
192.168.19.227 Kylin 3.3 mysql3    9.5.2  psql_standby
postgres=# select version();
version
-------------------------------------------------------------------------------------------------------------
PostgreSQL 9.5.2 on x86_64-kylin-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
prot=5432

 

2 yum 安装pgsql

[root@mysqlhq ~]# yum list | grep postgresql
[root@mysqlhq ~]# yum install postgresql-server postgresql-contrib
[root@mysqlhq ~]# service postgresql initdb
[root@mysqlhq ~]# systemctl start postgresql
[root@mysqlhq ~]# systemctl status postgresql
[root@mysqlhq ~]# netstat -lnt|grep 5432
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN
[root@mysqlhq ~]# ps -ef|grep postgres
[root@mysqlhq ~]# su - postgres
-bash-4.2$ psql
psql (9.5.2)
Type "help" for help.

3 修改主库的配置文件

#default 路径///var/lib/pgsql/data
[root@mysqlhq ~]# cd /var/lib/pgsql/data/
[root@mysqlhq data]# cp postgresql.conf postgresql.conf.bk20181213
[root@mysqlhq data]# vim postgresql.conf
listen_addresses = '*'
max_connections = 1000
wal_level = hot_standby
#synchronous_commit = on(同步复制,-- 实时,如果需要异步改为off)
checkpoint_timeout = 5min
archive_mode = on
archive_command = '/bin/date'
max_wal_senders = 2
wal_keep_segments = 16
hot_standby = on
max_standby_archive_delay = 300s
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 1s
hot_standby_feedback = on
wal_receiver_timeout = 60s

-bash-4.2$ cat /var/lib/pgsql/data/pg_hba.conf | grep -v '^#' | grep -v '^$'
local all all peer
host all all 127.0.0.1/32 trust
host all all 10.15.7.115/32 trust
host all all 192.168.19.145/32 md5
host all all ::1/128 ident
host replication repuser 192.168.19.227/32 md5

[root@mysqlhq data]# su - postgres
Last login: Tue Dec 11 16:34:47 CST 2018 on pts/3
-bash-4.2$ psql
psql (9.5.2)
Type "help" for help.

postgres=# create role repuser login replication encrypted password 'pgreplication';
CREATE ROLE

-bash-4.2$ /usr/bin/pg_ctl stop
waiting for server to shut down.... done
server stopped
-bash-4.2$ /usr/bin/pg_ctl start
server starting
-bash-4.2$ netstat -lnt|grep 5432
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN
tcp6 0 0 :::5432 :::* LISTEN
查看日志
tail -f -n 100 /var/lib/pgsql/data/pg_log/postgresql-Thu.log

从库配置
1 基础备份
从库安装完成后,不初始化,若已经初始化,删除其data目录
[root@mysql3 pgsql]# rm -rf data/

[root@mysql3 pgsql]# /usr/bin/pg_basebackup -D /var/lib/pgsql/data -F p -X stream -R -v -P -h 192.168.19.145 -p 5432 -U repuser
口令:
事务日志起始于时间点: 0/7000028, 基于时间表1
pg_basebackup: 启动后台 WAL 接收进程
73823/73876 kB (100%), 2/2 表空间
pg_basebackup: 无法得到来自服务器的事务日志终止位置: ERROR: could not open file "./postgresql.conf.bk20181213": Permission denied
删除文件./postgresql.conf.bk20181213,这些配置文件的备份需要放在非data目录

-R 备份后对文件recovery.conf进行写操作
删除数据目录,重新pg_basebackup
[root@mysql3 pgdata]# /usr/bin/pg_basebackup -D /var/lib/pgsql/data -F p -X stream -R -v -P -h 192.168.19.145 -p 5432 -U repuser
口令:
事务日志起始于时间点: 0/9000028, 基于时间表1
pg_basebackup: 启动后台 WAL 接收进程
73859/73859 kB (100%), 2/2 表空间
transaction log end point: 0/90000F8
pg_basebackup: 等待后台进程结束流操作...
pg_basebackup: base backup completed
2 配置备库参数postgresql.conf
[root@mysql3 data]# vim postgresql.conf
#在基础备份时,初始化文件是从主库复制来的,所以配置文件一致,注释掉
wal_level,
max_wal_senders 
wal_keep_segments等参数
打开如下参数:

hot_standby = on   #在备份的同时允许查询
max_standby_streaming_delay = 30s #可选,流复制最大延迟
wal_receiver_status_interval = 10s #可选,从向主报告状态的最大间隔时间
hot_standby_feedback = on #可选,查询冲突时向主反馈
max_connections = 1000 #默认参数,非主从配置相关参数,表示到数据库的连接数,一般从库做主要的读服务时,设置值需要高于主库
3 创建恢复文件recovery.conf
yum安装的pg,所以在pg_basebackup的时候加上-R参数,生成recovery.conf
[root@mysql3 data]# cat recovery.conf
standby_mode = 'on' #指明从库身份
primary_conninfo = 'user=repuser password=pgreplication host=192.168.19.145 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres' #连接到主库信息
[root@mysql3 data]# vim recovery.conf
recovery_target_timeline = 'latest' #同步到最新数据

#trigger_file = '/postgres/data/trigger_activestandby'
指定触发文件,文件存在时,将触发从库提升为主库,前提是必须设置”standby_mode = on”;如果不设置此参数,也可采用”pg_ctl promote“触发从库切换成主库

chown -R postgres:postgres /var/lib/pgsql/data
chown -R postgres:postgres /usr/local/pgdata
[root@mysql3 data]# /usr/bin/pg_ctl start
pg_ctl: 无法以 root 用户运行
请以服务器进程所属用户 (非特权用户) 登录 (或使用 "su")

[root@mysql3 data]# su - postgres
上一次登录:四 12月 13 18:01:23 CST 2018pts/1 上
-bash-4.2$ /usr/bin/pg_ctl start
正在启动服务器进程
-bash-4.2$ LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "pg_log".

4 验证主从
1 主库进程 sender
-bash-4.2$ ps -ef|grep postgres
postgres 4087 26671 0 09:44 ? 00:00:00 postgres: wal sender process repuser 192.168.19.227(55336) streaming 0/D000140
postgres 4093 24928 31 09:45 pts/3 00:00:00 ps -ef
postgres 4094 24928 7 09:45 pts/3 00:00:00 grep --color=auto postgres
root 24927 30012 0 Dec13 pts/3 00:00:00 su - postgres
postgres 24928 24927 0 Dec13 pts/3 00:00:00 -bash
postgres 26671 1 0 Dec13 pts/3 00:00:04 /usr/bin/postgres
postgres 26672 26671 0 Dec13 ? 00:00:00 postgres: logger process
postgres 26674 26671 0 Dec13 ? 00:00:02 postgres: checkpointer process
postgres 26675 26671 0 Dec13 ? 00:00:00 postgres: writer process
postgres 26676 26671 0 Dec13 ? 00:00:01 postgres: wal writer process
postgres 26677 26671 0 Dec13 ? 00:00:04 postgres: autovacuum launcher process
postgres 26678 26671 0 Dec13 ? 00:00:00 postgres: archiver process last was 00000001000000000000000C
postgres 26679 26671 0 Dec13 ? 00:00:05 postgres: stats collector process

2 验证从库 recovering
-bash-4.2$ ps -ef|grep postgres
root 32525 9758 0 09:44 pts/1 00:00:00 su - postgres
postgres 32526 32525 0 09:44 pts/1 00:00:00 -bash
postgres 32563 1 0 09:44 pts/1 00:00:00 /usr/bin/postgres
postgres 32564 32563 0 09:44 ? 00:00:00 postgres: logger process
postgres 32565 32563 0 09:44 ? 00:00:00 postgres: startup process recovering 00000001000000000000000D
postgres 32566 32563 0 09:44 ? 00:00:00 postgres: checkpointer process
postgres 32567 32563 0 09:44 ? 00:00:00 postgres: writer process
postgres 32568 32563 0 09:44 ? 00:00:00 postgres: stats collector process
postgres 32569 32563 0 09:44 ? 00:00:00 postgres: wal receiver process streaming 0/D000140
postgres 32586 32526 0 09:46 pts/1 00:00:00 ps -ef
postgres 32587 32526 0 09:46 pts/1 00:00:00 grep --color=auto postgres

3 sql查询主从状态
-bash-4.2$ psql
psql (9.5.2)
Type "help" for help.

postgres=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay
_location | sync_priority | sync_state
------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-------
----------+---------------+------------
4087 | 25404 | repuser | walreceiver | 192.168.19.227 | | 55336 | 2018-12-14 09:44:45.165618+08 | 2103 | streaming | 0/D000140 | 0/D000140 | 0/D000140 | 0/D000
140 | 0 | async
(1 row)
select pg_is_in_recovery();

4 主从数据测试
主库 192.168.19.145
postgres-# \c yhq1
yhq1=# CREATE TABLE yhq22(id integer not null,date TIMESTAMP NOT NULL DEFAULT LOCALTIMESTAMP(0));
CREATE TABLE
yhq1=# INSERT INTO yhq22 (id) VALUES (1);
INSERT 0 1
yhq1=# INSERT INTO yhq22 (id) VALUES (1);
INSERT 0 1
yhq1=# INSERT INTO yhq22 (id) VALUES (1);
INSERT 0 1
yhq1=# commit;
WARNING: there is no transaction in progress
COMMIT
yhq1=# select * from yhq22;
id | date
----+---------------------
1 | 2018-12-14 09:52:51
1 | 2018-12-14 09:52:53
1 | 2018-12-14 09:52:53
(3 rows)
从库
postgres=# \c yhq1
您现在已经连接到数据库 "yhq1",用户 "postgres".
yhq1=# \d
关联列表
架构模式 | 名称 | 类型 | 拥有者
----------+-------+--------+----------
public | yhq | 数据表 | postgres
public | yhq22 | 数据表 | postgres
(2 行记录)

yhq1=# select * from yhq22;
id | date
----+---------------------
1 | 2018-12-14 09:52:51
1 | 2018-12-14 09:52:53
1 | 2018-12-14 09:52:53
(3 行记录)
yhq1=# INSERT INTO yhq22 (id) VALUES (2);
ERROR: cannot execute INSERT in a read-only transaction

5命令查看主从状态
192.168.19.145
-bash-4.2$ pg_controldata /var/lib/pgsql/data/
pg_control version number: 942
Catalog version number: 201510051
Database system identifier: 6606088538881141528
Database cluster state: in production
pg_control last modified: Fri 14 Dec 2018 10:33:14 AM CST
Latest checkpoint location: 0/D017540
Prior checkpoint location: 0/D017460
Latest checkpoint's REDO location: 0/D017508
Latest checkpoint's REDO WAL file: 00000001000000000000000D
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/2107
Latest checkpoint's NextOID: 33597
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 1823
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 2107
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 24598
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Fri 14 Dec 2018 10:33:14 AM CST
192.168.19.227
-bash-4.2$ pg_controldata /var/lib/pgsql/data/
pg_control 版本: 942
Catalog 版本: 201510051
数据库系统标识符: 6606088538881141528
数据库簇状态: 正在归档恢复
pg_control 最后修改: 2018年12月14日 星期五 10时34分46秒
最新检查点位置: 0/D017540
优先检查点位置: 0/D017460
最新检查点的 REDO 位置: 0/D017508
最新检查点的重做日志文件: 00000001000000000000000D
最新检查点的 TimeLineID: 1
最新检查点的PrevTimeLineID: 1
最新检查点的full_page_writes: 开启
最新检查点的 NextXID: 0/2107
最新检查点的 NextOID: 33597
最新检查点的NextMultiXactId: 1
最新检查点的NextMultiOffsetD: 0
最新检查点的oldestXID: 1823
最新检查点的oldestXID所在的数据库:1
最新检查点检查oldestActiveXID: 2107
最新检查点检查oldestMultiXid: 1
最新检查点的oldestMulti所在的数据库:24598
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
最新检查点的时间: 2018年12月14日 星期五 10时33分14秒
不带日志的关系: 0/1使用虚假的LSN计数器

 

6 主从切换
192.168.19.145
-bash-4.2$ pg_ctl stop
waiting for server to shut down...... done
server stopped
-bash-4.2$ pg_controldata /var/lib/pgsql/data/
pg_control version number: 942
Catalog version number: 201510051
Database system identifier: 6606088538881141528
Database cluster state: shut down
pg_control last modified: Fri 14 Dec 2018 10:38:54 AM CST
Latest checkpoint location: 0/E000028
Prior checkpoint location: 0/D017620
Latest checkpoint's REDO location: 0/E000028
从 192.168.19.227
从库日志
-bash-4.2$ tail -n 100 /var/lib/pgsql/data/pg_log/postgresql-Fri.log
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/D000000 on timeline 1
ERROR: syntax error at or near "select" at character 35
STATEMENT: INSERT INTO yhq22 (id) VALUES (2)
select * from yhq22;
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1 at 0/E000098.
FATAL: could not send end-of-streaming message to primary: no COPY in progress

LOG: invalid record length at 0/E000098
FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "192.168.19.145" and accepting
TCP/IP connections on port 5432?
.....
LOG: received promote request
LOG: redo done at 0/E000028
LOG: last completed transaction was at log time 2018-12-14 09:52:53.356998+08
LOG: selected new timeline ID: 2
LOG: archive recovery complete
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started

-bash-4.2$ pg_ctl promote
服务器重新加载中
-bash-4.2$ pg_controldata /var/lib/pgsql/data/
pg_control 版本: 942
Catalog 版本: 201510051
数据库系统标识符: 6606088538881141528
数据库簇状态: 在运行中
pg_control 最后修改: 2018年12月14日 星期五 10时40分21秒
最新检查点位置: 0/E000100
优先检查点位置: 0/E000028
最新检查点的 REDO 位置: 0/E0000C8
最新检查点的重做日志文件: 00000002000000000000000E
从库正常读写
yhq1=# INSERT INTO yhq22 (id) VALUES (2);
INSERT 0 1
yhq1=# select * from yhq22;
id | date
----+---------------------
1 | 2018-12-14 09:52:51
1 | 2018-12-14 09:52:53
1 | 2018-12-14 09:52:53
2 | 2018-12-14 10:41:04
从库的文件 变成recovery.conf
-rw-r--r-- 1 postgres postgres 195 12月 14 09:41 recovery.done

原主库操作(恢复原主库为从库)
1 重新搭建新主从 ?
2 增量搭建
这时候原从库已经写入了很多其他数据
yhq1=# select count(*) from yhq22;
count
-------
8
测试1
从 192.168.19.145
新建文件recovery.conf 并修改文件postgresql.conf
-bash-4.2$ vim recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=repuser password=pgreplication host=192.168.19.227 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
recovery_target_timeline = 'latest'
-bash-4.2$ vim postgresql.conf
#max_wal_senders 
#wal_keep_segments
-bash-4.2$ /usr/bin/pg_ctl start
server starting
-bash-4.2$ LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "pg_log".

从 192.168.19.145
-bash-4.2$ pg_controldata /var/lib/pgsql/data/
pg_control version number: 942
Catalog version number: 201510051
Database system identifier: 6606088538881141528
Database cluster state: in archive recovery
pg_control last modified: Fri 14 Dec 2018 11:31:48 AM CST
Latest checkpoint location: 0/E000028
Prior checkpoint location: 0/E000028
Latest checkpoint's REDO location: 0/E000028
Latest checkpoint's REDO WAL file: 00000001000000000000000E
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/2107
主 192.168.19.227
-bash-4.2$ pg_controldata /var/lib/pgsql/data/
pg_control 版本: 942
Catalog 版本: 201510051
数据库系统标识符: 6606088538881141528
数据库簇状态: 在运行中
pg_control 最后修改: 2018年12月14日 星期五 11时30分22秒
最新检查点位置: 0/E000E00
优先检查点位置: 0/E000D20
最新检查点的 REDO 位置: 0/E000DC8
最新检查点的重做日志文件: 00000002000000000000000E
最新检查点的 TimeLineID: 2
最新检查点的PrevTimeLineID: 2
最新检查点的full_page_writes: 开启
最新检查点的 NextXID: 0/2112
yhq1=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priorit
y | sync_state
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+--------------+-------+---------------+----------------+----------------+-----------------+-------------
--+------------
(0 行记录)

新从--是没有新增加的数据的
postgres=# \c yhq1
You are now connected to database "yhq1" as user "postgres".
yhq1=# select *from yhq22;
id | date
----+---------------------
1 | 2018-12-14 09:52:51
1 | 2018-12-14 09:52:53
1 | 2018-12-14 09:52:53
yhq1=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priorit
y | sync_state
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+--------------+-------+---------------+----------------+----------------+-----------------+-------------
--+------------
-bash-4.2$ tail -n 100 /var/lib/pgsql/data/pg_log/postgresql-Fri.log
FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "192.168.19.145", user "repuser", SSL off

FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host "192.168.19.145", user "repuser", SSL off

主 192.168.19.227
-bash-4.2$ vim pg_hba.conf
host replication repuser 192.168.19.145/32 md5
-bash-4.2$ /usr/bin/pg_ctl reload

yhq1=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay
_location | sync_priority | sync_state
------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-------
----------+---------------+------------
2061 | 25404 | repuser | walreceiver | 192.168.19.145 | | 52776 | 2018-12-14 11:40:34.130154+08 | 2112 | streaming | 0/E001068 | 0/E001068 | 0/E001068 | 0/E001
068 | 0 | async
(1 行记录)
新从 192.168.19.145
数据同步
yhq1=# select *from yhq22;
id | date
----+---------------------
1 | 2018-12-14 09:52:51
1 | 2018-12-14 09:52:53
1 | 2018-12-14 09:52:53
2 | 2018-12-14 10:41:04
2 | 2018-12-14 11:24:06
2 | 2018-12-14 11:24:07
2 | 2018-12-14 11:24:07
2 | 2018-12-14 11:24:08
(8 rows)
-bash-4.2$ tail -n 100 /var/lib/pgsql/data/pg_log/postgresql-Fri.log
LOG: fetching timeline history file for timeline 2 from primary server
LOG: started streaming WAL from primary at 0/E000000 on timeline 1
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1 at 0/E000098.
LOG: new target timeline is 2
LOG: restarted WAL streaming at 0/E000000 on timeline 2
LOG: redo starts at 0/E000098
-bash-4.2$ ps -ef|grep postgres
root 4627 13226 0 10:37 pts/2 00:00:00 su - postgres
postgres 4628 4627 0 10:37 pts/2 00:00:00 -bash
postgres 4806 1 0 11:31 pts/3 00:00:01 /usr/bin/postgres
postgres 4807 4806 0 11:31 ? 00:00:00 postgres: logger process
postgres 4808 4806 0 11:31 ? 00:00:00 postgres: startup process recovering 00000002000000000000000E
postgres 4809 4806 0 11:31 ? 00:00:00 postgres: checkpointer process
postgres 4810 4806 0 11:31 ? 00:00:00 postgres: writer process
postgres 4811 4806 0 11:31 ? 00:00:00 postgres: stats collector process
postgres 4861 24928 0 11:35 pts/3 00:00:00 psql
postgres 4864 4806 0 11:35 ? 00:00:00 postgres: postgres yhq1 [local] idle
postgres 4940 4806 0 11:40 ? 00:00:04 postgres: wal receiver process streaming 0/E002E40
postgres 5117 4628 0 14:31 pts/2 00:00:00 ps -ef
postgres 5118 4628 0 14:31 pts/2 00:00:00 grep --color=auto postgres
root 24927 30012 0 Dec13 pts/3 00:00:00 su - postgres
postgres 24928 24927 0 Dec13 pts/3 00:00:00 -bash
yhq1=# show synchronous_commit ;
synchronous_commit
--------------------
on
(1 row)


yhq1=# select txid_current_snapshot(); #返回主库记录点、备库记录点;主库每增加一条写入,记录点的值就会加1
txid_current_snapshot
-----------------------
2117:2117:
(1 行记录)

yhq1=# select txid_current_snapshot();
txid_current_snapshot
-----------------------
2117:2117:
(1 row)
查看备库落后主库多少个字节的wal日志
yhq1=# select pg_xlog_location_diff(pg_current_xlog_location(),replay_location) from pg_stat_replication;
pg_xlog_location_diff
-----------------------
(0 rows)