基于GTID Replication主从数据不一致操作

时间:2021-10-27 03:41:22
基本的M-S结构
 
现在master与slave主机数据一致:
 
mysql> select * from t1;
+------+
| id   |
+------+
|    1 |
|    2 |
|    4 |
+------+
3 rows in set (0.00 sec)
 
我们来模拟故障现象:
在master上,通过设置sql_log_bin来控制命令是否写入二进制日志中,运行命令:
set sql_log_bin=OFF;
insert into t1 values(5);
set sql_log_bin=ON;
insert into t1 values(6);
这样数据在master上是这样的:
mysql> select * from t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 4 |
| 5 |
| 6 |
+------+
5 rows in set (0.00 sec)
而在slave上的数据是这样的:
mysql> select * from testdb.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 4 |
| 6 |
+------+
4 rows in set (0.00 sec)
此时我们在master上执行操作:
update t1 set id=7 where id=5;
insert into t1 values(8); mysql> select * from t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 4 |
| 7 |
| 6 |
| 8 |
+------+
6 rows in set (0.00 sec)
我们查询一下slave上的数据:
mysql> select * from testdb.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 4 |
| 6 |
+------+
4 rows in set (0.00 sec)
发现数据并没有被同步过来,我们来show一下slave
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.203.135
Master_User: replmonitor
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: host2-bin.000002
Read_Master_Log_Pos: 1146
Relay_Log_File: host3-relay-bin.000003
Relay_Log_Pos: 648
Relay_Master_Log_File: host2-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table testdb.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log host2-bin.000002, end_log_pos 875
Skip_Counter: 0
Exec_Master_Log_Pos: 660
Relay_Log_Space: 1595
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table testdb.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log host2-bin.000002, end_log_pos 875
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: cd32a980-2a97-11e5-a344-000c2954ccde
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 150717 23:51:51
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: cd32a980-2a97-11e5-a344-000c2954ccde:1-3
Executed_Gtid_Set: cd32a980-2a97-11e5-a344-000c2954ccde:1
Auto_Position: 1
1 row in set (0.00 sec)
此时我们查看一下错误日志:
 
2015-07-17 23:51:51 30750 [ERROR] Slave SQL: Could not execute Update_rows event on table testdb.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log host2-bin.000002, end_log_pos 875, Error_code: 1032
2015-07-17 23:51:51 30750 [Warning] Slave: Can't find record in 't1' Error_code: 1032
2015-07-17 23:51:51 30750 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'host2-bin.000002' position 660
 
提示我们在修改错误并重新启动slave,但关键在于如何修正错误,一般我们采用如下方法:
 
在从机slave上执行:
 
mysql> stop slave;
Query OK, 0 rows affected (0.02 sec) mysql> set gtid_next='cd32a980-2a97-11e5-a344-000c2954ccde:2';
Query OK, 0 rows affected (0.00 sec) mysql> begin;
Query OK, 0 rows affected (0.00 sec) mysql> commit;
Query OK, 0 rows affected (0.00 sec) mysql> set gtid_next='AUTOMATIC';
Query OK, 0 rows affected (0.00 sec) mysql> start slave;
Query OK, 0 rows affected (0.01 sec) 即跳过有错误的GTID编号
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.203.135
Master_User: replmonitor
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: host2-bin.000002
Read_Master_Log_Pos: 1146
Relay_Log_File: host3-relay-bin.000005
Relay_Log_Pos: 688
Relay_Master_Log_File: host2-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1146
Relay_Log_Space: 1429
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: cd32a980-2a97-11e5-a344-000c2954ccde
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: cd32a980-2a97-11e5-a344-000c2954ccde:1-3
Executed_Gtid_Set: cd32a980-2a97-11e5-a344-000c2954ccde:1-3
Auto_Position: 1
1 row in set (0.00 sec) mysql> select * from testdb.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 4 |
| 6 |
| 8 |
+------+
5 rows in set (0.00 sec)
可以发现事务已经全部同步了,但现在数据却是不一致的。
这种适合在数据要求不高的情况下使用,如果后续还有其它数据操作,造成主从数据差异过大,就得用从主库的一个完整备份过来,再从某点开始做replication。