Percona-Tookit工具包之pt-slave-restart

时间:2022-10-06 18:46:19
 
Preface
 
    Sometimes,the threads(especially the SQL_Thread) will be terminated by accident.The slave will stop until we fix the relevant error or just skip them.There's many methods we can use to solve these kind of issues.For example,we can set "sql_salve_skip_counter" in mysql client and we can set "slave_skip_errors" in my.cnf as well(static parameter,need to restart mysqld).Further more,set "slave_exec_mode=idempotent" may also be useful.It can skip the "duplicate-key" or "no-key-found" errors.It means that we don't need to pay more attention to error 1062 and 1032 any more if we do so.Is there any other method can solve these skipping error issues?Surely there is one.
 
Introduce
 
    pt-slave-restart is the one who can skip specific errors and restart the slave automatically.But considering of the consistency of data,we'd better do think twice when skipping errors by it.There're two test below.Let's see how the pt-slave-restart works.
 
Procedure
 
Usage
1 pt-slave-restart [OPTIONS] [DSN]

 

Common parameter
 1 --always //Start slaves if there's no error,it will prevent to you stop slave manully.
 2 --check-relay-log //Check the relay log file before checking errors on slave.
 3 --error-length //Specify the max length to print.
 4 --error-number //Specify the number of errors you want to make the slave try to restart.
 5 --error-text //Specify the text of errors you want to make the slave try to restart.
 6 --max-sleep //Specify the maximum sleep time after it polling the slave.
 7 --min-sleep //Specify the minimum sleep time after it polling the slave.
 8 --monitor //whether to monitor the slave(default)
 9 --master-uuid //Specify the master-uuid you want to skip,if you have multiple writing node.
10 --recurse //If you want to execute pt-slave-restart on master.It's dispensable option.
11 --recurse-method //This option is used together with "--recurse".It specify the method to find slaves.
12 --sleep //Specify the interval of sleep time in checking the slave.
13 --until-master //run until this master logfile,position then exit.
14 --until-relay //run until this relay logfile,position then exit.

 

Example
 
1.Test of automatically restart slave.
 
Execute pt-slave-restart with "--always".
 1 [root@zlm1 12:15:35 ~]
 2 #pt-slave-restart --recurse=1 --always -h192.168.56.101 -P3306 -urepl -prepl4slave
 3 *******************************************************************
 4  Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client
 5  is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER
 6  together with SSL_ca_file|SSL_ca_path for verification.
 7  If you really don't want to verify the certificate and keep the
 8  connection open to Man-In-The-Middle attacks please set
 9  SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application.
10 *******************************************************************
11   at /usr/bin/pt-slave-restart line 3841.
12 *******************************************************************
13  Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client
14  is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER
15  together with SSL_ca_file|SSL_ca_path for verification.
16  If you really don't want to verify the certificate and keep the
17  connection open to Man-In-The-Middle attacks please set
18  SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application.
19 *******************************************************************
20   at /usr/bin/pt-slave-restart line 3841.
21 
22 # A software update is available:
23 2018-07-21T12:18:15 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000030         407 0
24 
25 //It has begun to monitor the slave to automatically startup.

 

Check whether the slave replication is normal.
 1 zlm@192.168.56.101:3306 [(none)]>show slave status\G
 2 *************************** 1. row ***************************
 3                Slave_IO_State: Waiting for master to send event
 4                   Master_Host: 192.168.56.100
 5                   Master_User: repl
 6                   Master_Port: 3306
 7                 Connect_Retry: 60
 8               Master_Log_File: mysql-bin.000091
 9           Read_Master_Log_Pos: 194
10                Relay_Log_File: relay-bin.000031
11                 Relay_Log_Pos: 407
12         Relay_Master_Log_File: mysql-bin.000091
13              Slave_IO_Running: Yes
14             Slave_SQL_Running: Yes
15               Replicate_Do_DB: 
16           Replicate_Ignore_DB: 
17            Replicate_Do_Table: 
18        Replicate_Ignore_Table: 
19       Replicate_Wild_Do_Table: 
20   Replicate_Wild_Ignore_Table: 
21                    Last_Errno: 0
22                    Last_Error: 
23                  Skip_Counter: 0
24           Exec_Master_Log_Pos: 194
25               Relay_Log_Space: 861
26               Until_Condition: None
27                Until_Log_File: 
28                 Until_Log_Pos: 0
29            Master_SSL_Allowed: No
30            Master_SSL_CA_File: 
31            Master_SSL_CA_Path: 
32               Master_SSL_Cert: 
33             Master_SSL_Cipher: 
34                Master_SSL_Key: 
35         Seconds_Behind_Master: 0
36 Master_SSL_Verify_Server_Cert: No
37                 Last_IO_Errno: 0
38                 Last_IO_Error: 
39                Last_SQL_Errno: 0
40                Last_SQL_Error: 
41   Replicate_Ignore_Server_Ids: 
42              Master_Server_Id: 1003306
43                   Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e
44              Master_Info_File: /data/mysql/mysql3306/data/master.info
45                     SQL_Delay: 0
46           SQL_Remaining_Delay: NULL
47       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
48            Master_Retry_Count: 86400
49                   Master_Bind: 
50       Last_IO_Error_Timestamp: 
51      Last_SQL_Error_Timestamp: 
52                Master_SSL_Crl: 
53            Master_SSL_Crlpath: 
54            Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627
55             Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1,
56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627
57                 Auto_Position: 1
58          Replicate_Rewrite_DB: 
59                  Channel_Name: 
60            Master_TLS_Version: 
61 1 row in set (0.00 sec)

 

Stop slave replication.
  1 zlm@192.168.56.101:3306 [(none)]>stop slave;
  2 Query OK, 0 rows affected (0.03 sec)
  3 
  4 zlm@192.168.56.101:3306 [(none)]>show slave status\G
  5 *************************** 1. row ***************************
  6                Slave_IO_State: 
  7                   Master_Host: 192.168.56.100
  8                   Master_User: repl
  9                   Master_Port: 3306
 10                 Connect_Retry: 60
 11               Master_Log_File: mysql-bin.000091
 12           Read_Master_Log_Pos: 194
 13                Relay_Log_File: relay-bin.000031
 14                 Relay_Log_Pos: 407
 15         Relay_Master_Log_File: mysql-bin.000091
 16              Slave_IO_Running: No  //IO_Thread has stopped.
 17             Slave_SQL_Running: No  //SQL_Thread has stopped.
 18               Replicate_Do_DB: 
 19           Replicate_Ignore_DB: 
 20            Replicate_Do_Table: 
 21        Replicate_Ignore_Table: 
 22       Replicate_Wild_Do_Table: 
 23   Replicate_Wild_Ignore_Table: 
 24                    Last_Errno: 0
 25                    Last_Error: 
 26                  Skip_Counter: 0
 27           Exec_Master_Log_Pos: 194
 28               Relay_Log_Space: 861
 29               Until_Condition: None
 30                Until_Log_File: 
 31                 Until_Log_Pos: 0
 32            Master_SSL_Allowed: No
 33            Master_SSL_CA_File: 
 34            Master_SSL_CA_Path: 
 35               Master_SSL_Cert: 
 36             Master_SSL_Cipher: 
 37                Master_SSL_Key: 
 38         Seconds_Behind_Master: NULL
 39 Master_SSL_Verify_Server_Cert: No
 40                 Last_IO_Errno: 0
 41                 Last_IO_Error: 
 42                Last_SQL_Errno: 0
 43                Last_SQL_Error: 
 44   Replicate_Ignore_Server_Ids: 
 45              Master_Server_Id: 1003306
 46                   Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e
 47              Master_Info_File: /data/mysql/mysql3306/data/master.info
 48                     SQL_Delay: 0
 49           SQL_Remaining_Delay: NULL
 50       Slave_SQL_Running_State: 
 51            Master_Retry_Count: 86400
 52                   Master_Bind: 
 53       Last_IO_Error_Timestamp: 
 54      Last_SQL_Error_Timestamp: 
 55                Master_SSL_Crl: 
 56            Master_SSL_Crlpath: 
 57            Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627
 58             Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1,
 59 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627
 60                 Auto_Position: 1
 61          Replicate_Rewrite_DB: 
 62                  Channel_Name: 
 63            Master_TLS_Version: 
 64 1 row in set (0.00 sec)
 65 
 66 //Wait for a while and check it again.
 67 
 68 zlm@192.168.56.101:3306 [(none)]>show slave status\G
 69 *************************** 1. row ***************************
 70                Slave_IO_State: Waiting for master to send event
 71                   Master_Host: 192.168.56.100
 72                   Master_User: repl
 73                   Master_Port: 3306
 74                 Connect_Retry: 60
 75               Master_Log_File: mysql-bin.000091
 76           Read_Master_Log_Pos: 194
 77                Relay_Log_File: relay-bin.000032
 78                 Relay_Log_Pos: 407
 79         Relay_Master_Log_File: mysql-bin.000091
 80              Slave_IO_Running: Yes //IO_Thread has started again.
 81             Slave_SQL_Running: Yes //SQL_Thread has started again.
 82               Replicate_Do_DB: 
 83           Replicate_Ignore_DB: 
 84            Replicate_Do_Table: 
 85        Replicate_Ignore_Table: 
 86       Replicate_Wild_Do_Table: 
 87   Replicate_Wild_Ignore_Table: 
 88                    Last_Errno: 0
 89                    Last_Error: 
 90                  Skip_Counter: 0
 91           Exec_Master_Log_Pos: 194
 92               Relay_Log_Space: 861
 93               Until_Condition: None
 94                Until_Log_File: 
 95                 Until_Log_Pos: 0
 96            Master_SSL_Allowed: No
 97            Master_SSL_CA_File: 
 98            Master_SSL_CA_Path: 
 99               Master_SSL_Cert: 
100             Master_SSL_Cipher: 
101                Master_SSL_Key: 
102         Seconds_Behind_Master: 0
103 Master_SSL_Verify_Server_Cert: No
104                 Last_IO_Errno: 0
105                 Last_IO_Error: 
106                Last_SQL_Errno: 0
107                Last_SQL_Error: 
108   Replicate_Ignore_Server_Ids: 
109              Master_Server_Id: 1003306
110                   Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e
111              Master_Info_File: /data/mysql/mysql3306/data/master.info
112                     SQL_Delay: 0
113           SQL_Remaining_Delay: NULL
114       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
115            Master_Retry_Count: 86400
116                   Master_Bind: 
117       Last_IO_Error_Timestamp: 
118      Last_SQL_Error_Timestamp: 
119                Master_SSL_Crl: 
120            Master_SSL_Crlpath: 
121            Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627
122             Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1,
123 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627
124                 Auto_Position: 1
125          Replicate_Rewrite_DB: 
126                  Channel_Name: 
127            Master_TLS_Version: 
128 1 row in set (0.00 sec)

 

The output is added one line which shows the connection status of slave on master.
1 # A software update is available:
2 2018-07-21T12:18:15 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000030         407 0 
3 2018-07-21T12:21:27 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000031         407 0

 

Check the general log of slave to see the details.
 1 [root@zlm2 12:21:46 /data/mysql/mysql3306/data]
 2 #tail -f zlm2.log
 3 mysqld, Version: 5.7.21-log (MySQL Community Server (GPL)). started with:
 4 Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
 5 Time                 Id Command    Argument
 6 2018-07-21T10:21:14.049221Z       30 Query    stop slave
 7 2018-07-21T10:21:21.598248Z       30 Query    show slave status
 8 2018-07-21T10:21:27.080665Z       31 Query    SHOW SLAVE STATUS
 9 2018-07-21T10:21:27.081848Z       31 Query    START SLAVE
10 2018-07-21T10:21:27.084221Z       36 Connect Out    repl@192.168.56.100:3306
11 2018-07-21T10:21:28.086327Z       31 Query    SHOW SLAVE STATUS
12 2018-07-21T10:21:46.570310Z       35 Quit    
13 2018-07-21T10:22:02.782934Z       38 Connect    zlm@zlm2 on  using TCP/IP
14 2018-07-21T10:22:02.784208Z       38 Query    select @@version_comment limit 1
15 2018-07-21T10:22:02.785102Z       38 Query    select USER()
16 2018-07-21T10:22:10.440909Z       38 Query    show slave status
17 2018-07-21T10:22:14.535233Z       38 Query    show slave status
18 2018-07-21T10:22:32.087271Z       31 Query    SHOW SLAVE STATUS
19 ...

 

2.Test of skipping error of  replication.
 
Do some operations on master.
 1 zlm@192.168.56.100:3306 [zlm]>show tables;
 2 +---------------+
 3 | Tables_in_zlm |
 4 +---------------+
 5 | test_gbk      |
 6 | test_utf8     |
 7 | test_utf8mb4  |
 8 +---------------+
 9 3 rows in set (0.00 sec)
10 
11 zlm@192.168.56.100:3306 [zlm]>drop table test_gbk,test_utf8,test_utf8mb4;
12 Query OK, 0 rows affected (0.04 sec)
13 
14 zlm@192.168.56.100:3306 [zlm]>show tables;
15 Empty set (0.00 sec)
16 
17 zlm@192.168.56.100:3306 [zlm]>create table test(
18     -> id int,
19     -> name varchar(10)
20     -> ) engine=innodb;
21 Query OK, 0 rows affected (0.03 sec)
22 
23 zlm@192.168.56.100:3306 [zlm]>alter table test add primary key(id);
24 Query OK, 0 rows affected (0.01 sec)
25 Records: 0  Duplicates: 0  Warnings: 0
26 
27 zlm@192.168.56.100:3306 [zlm]>alter table test modify column id int unsigned not null auto_increment;
28 Query OK, 0 rows affected (0.02 sec)
29 Records: 0  Duplicates: 0  Warnings: 0
30 
31 zlm@192.168.56.100:3306 [zlm]>show create table test\G
32 *************************** 1. row ***************************
33        Table: test
34 Create Table: CREATE TABLE `test` (
35   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
36   `name` varchar(10) DEFAULT NULL,
37   PRIMARY KEY (`id`)
38 ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8
39 1 row in set (0.00 sec)
40 
41 zlm@192.168.56.100:3306 [zlm]>insert into test values(1,'zlm');
42 Query OK, 1 row affected (0.00 sec)
43 
44 zlm@192.168.56.100:3306 [zlm]>select * from test;
45 +----+------+
46 | id | name |
47 +----+------+
48 |  1 | zlm  |
49 +----+------+
50 1 row in set (0.00 sec)

 

Check the slave status on slave.
 1 zlm@192.168.56.101:3306 [zlm]>show slave status\G
 2 *************************** 1. row ***************************
 3                Slave_IO_State: Waiting for master to send event
 4                   Master_Host: 192.168.56.100
 5                   Master_User: repl
 6                   Master_Port: 3306
 7                 Connect_Retry: 60
 8               Master_Log_File: mysql-bin.000091
 9           Read_Master_Log_Pos: 1241
10                Relay_Log_File: relay-bin.000032
11                 Relay_Log_Pos: 407
12         Relay_Master_Log_File: mysql-bin.000091
13              Slave_IO_Running: Yes
14             Slave_SQL_Running: No
15               Replicate_Do_DB: 
16           Replicate_Ignore_DB: 
17            Replicate_Do_Table: 
18        Replicate_Ignore_Table: 
19       Replicate_Wild_Do_Table: 
20   Replicate_Wild_Ignore_Table: 
21                    Last_Errno: 1051
22                    Last_Error: Error 'Unknown table 'zlm.test_gbk,zlm.test_utf8,zlm.test_utf8mb4'' on query. Default database: 'zlm'. Query: 'DROP TABLE `test_gbk`,`test_utf8`,`test_utf8mb4` /* generated by server */'
23                  Skip_Counter: 0
24           Exec_Master_Log_Pos: 194
25               Relay_Log_Space: 1908
26               Until_Condition: None
27                Until_Log_File: 
28                 Until_Log_Pos: 0
29            Master_SSL_Allowed: No
30            Master_SSL_CA_File: 
31            Master_SSL_CA_Path: 
32               Master_SSL_Cert: 
33             Master_SSL_Cipher: 
34                Master_SSL_Key: 
35         Seconds_Behind_Master: NULL
36 Master_SSL_Verify_Server_Cert: No
37                 Last_IO_Errno: 0
38                 Last_IO_Error: 
39                Last_SQL_Errno: 1051
40                Last_SQL_Error: Error 'Unknown table 'zlm.test_gbk,zlm.test_utf8,zlm.test_utf8mb4'' on query. Default database: 'zlm'. Query: 'DROP TABLE `test_gbk`,`test_utf8`,`test_utf8mb4` /* generated by server */'
41   Replicate_Ignore_Server_Ids: 
42              Master_Server_Id: 1003306
43                   Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e
44              Master_Info_File: /data/mysql/mysql3306/data/master.info
45                     SQL_Delay: 0
46           SQL_Remaining_Delay: NULL
47       Slave_SQL_Running_State: 
48            Master_Retry_Count: 86400
49                   Master_Bind: 
50       Last_IO_Error_Timestamp: 
51      Last_SQL_Error_Timestamp: 180721 13:19:25
52                Master_SSL_Crl: 
53            Master_SSL_Crlpath: 
54            Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632
55             Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1,
56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627
57                 Auto_Position: 1
58          Replicate_Rewrite_DB: 
59                  Channel_Name: 
60            Master_TLS_Version: 
61 1 row in set (0.00 sec)
62 
63 zlm@192.168.56.101:3306 [zlm]>show tables;
64 +----------------+
65 | Tables_in_zlm  |
66 +----------------+
67 | semi_sync_test |
68 | t1             |
69 | t2             |
70 | t3             |
71 | test           |
72 | test_flashbk   |
73 +----------------+
74 6 rows in set (0.00 sec)
75 
76 //There're no target tables at all.So error 1051 occurs.

 

Try to skip the error 1051 on master.
 
1 [root@zlm1 13:18:45 ~]
2 #pt-slave-restart --recurse=0 --error-number=1051 -h192.168.56.101 -P3306 -urepl -prepl4slave
3 2018-07-21T13:35:31 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032         407 1051 
4 2018-07-21T13:35:31 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032         618 1050 
5 Error 1050 is not in --error-numbers. //error 1050 occurs then.

 

Check replicaiton slave status again.
 1 zlm@192.168.56.101:3306 [zlm]>show slave status\G
 2 *************************** 1. row ***************************
 3                Slave_IO_State: Waiting for master to send event
 4                   Master_Host: 192.168.56.100
 5                   Master_User: repl
 6                   Master_Port: 3306
 7                 Connect_Retry: 60
 8               Master_Log_File: mysql-bin.000091
 9           Read_Master_Log_Pos: 1241
10                Relay_Log_File: relay-bin.000032
11                 Relay_Log_Pos: 618
12         Relay_Master_Log_File: mysql-bin.000091
13              Slave_IO_Running: Yes
14             Slave_SQL_Running: No
15               Replicate_Do_DB: 
16           Replicate_Ignore_DB: 
17            Replicate_Do_Table: 
18        Replicate_Ignore_Table: 
19       Replicate_Wild_Do_Table: 
20   Replicate_Wild_Ignore_Table: 
21                    Last_Errno: 1050
22                    Last_Error: Error 'Table 'test' already exists' on query. Default database: 'zlm'. Query: 'create table test(
23 id int,
24 name varchar(10)
25 ) engine=innodb'
26                  Skip_Counter: 0
27           Exec_Master_Log_Pos: 405
28               Relay_Log_Space: 1908
29               Until_Condition: None
30                Until_Log_File: 
31                 Until_Log_Pos: 0
32            Master_SSL_Allowed: No
33            Master_SSL_CA_File: 
34            Master_SSL_CA_Path: 
35               Master_SSL_Cert: 
36             Master_SSL_Cipher: 
37                Master_SSL_Key: 
38         Seconds_Behind_Master: NULL
39 Master_SSL_Verify_Server_Cert: No
40                 Last_IO_Errno: 0
41                 Last_IO_Error: 
42                Last_SQL_Errno: 1050
43                Last_SQL_Error: Error 'Table 'test' already exists' on query. Default database: 'zlm'. Query: 'create table test(
44 id int,
45 name varchar(10)
46 ) engine=innodb'
47   Replicate_Ignore_Server_Ids: 
48              Master_Server_Id: 1003306
49                   Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e
50              Master_Info_File: /data/mysql/mysql3306/data/master.info
51                     SQL_Delay: 0
52           SQL_Remaining_Delay: NULL
53       Slave_SQL_Running_State: 
54            Master_Retry_Count: 86400
55                   Master_Bind: 
56       Last_IO_Error_Timestamp: 
57      Last_SQL_Error_Timestamp: 180721 13:35:31
58                Master_SSL_Crl: 
59            Master_SSL_Crlpath: 
60            Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632
61             Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1,
62 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715628
63                 Auto_Position: 1
64          Replicate_Rewrite_DB: 
65                  Channel_Name: 
66            Master_TLS_Version: 
67 1 row in set (0.00 sec)

 

Continue to skip error 1050 on master.
1 [root@zlm1 13:35:31 ~]
2 #pt-slave-restart --recurse=0 --error-number=1050 -h192.168.56.101 -P3306 -urepl -prepl4slave
3 2018-07-21T13:39:43 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032         618 1050 
4 2018-07-21T13:39:44 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032         814 1068 
5 Error 1068 is not in --error-numbers.

 

Check the slave status one more time.
 1 zlm@192.168.56.101:3306 [zlm]>show slave status\G
 2 *************************** 1. row ***************************
 3                Slave_IO_State: Waiting for master to send event
 4                   Master_Host: 192.168.56.100
 5                   Master_User: repl
 6                   Master_Port: 3306
 7                 Connect_Retry: 60
 8               Master_Log_File: mysql-bin.000091
 9           Read_Master_Log_Pos: 1241
10                Relay_Log_File: relay-bin.000032
11                 Relay_Log_Pos: 814
12         Relay_Master_Log_File: mysql-bin.000091
13              Slave_IO_Running: Yes
14             Slave_SQL_Running: No
15               Replicate_Do_DB: 
16           Replicate_Ignore_DB: 
17            Replicate_Do_Table: 
18        Replicate_Ignore_Table: 
19       Replicate_Wild_Do_Table: 
20   Replicate_Wild_Ignore_Table: 
21                    Last_Errno: 1068
22                    Last_Error: Error 'Multiple primary key defined' on query. Default database: 'zlm'. Query: 'alter table test add primary key(id)'
23                  Skip_Counter: 0
24           Exec_Master_Log_Pos: 601
25               Relay_Log_Space: 1908
26               Until_Condition: None
27                Until_Log_File: 
28                 Until_Log_Pos: 0
29            Master_SSL_Allowed: No
30            Master_SSL_CA_File: 
31            Master_SSL_CA_Path: 
32               Master_SSL_Cert: 
33             Master_SSL_Cipher: 
34                Master_SSL_Key: 
35         Seconds_Behind_Master: NULL
36 Master_SSL_Verify_Server_Cert: No
37                 Last_IO_Errno: 0
38                 Last_IO_Error: 
39                Last_SQL_Errno: 1068
40                Last_SQL_Error: Error 'Multiple primary key defined' on query. Default database: 'zlm'. Query: 'alter table test add primary key(id)'
41   Replicate_Ignore_Server_Ids: 
42              Master_Server_Id: 1003306
43                   Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e
44              Master_Info_File: /data/mysql/mysql3306/data/master.info
45                     SQL_Delay: 0
46           SQL_Remaining_Delay: NULL
47       Slave_SQL_Running_State: 
48            Master_Retry_Count: 86400
49                   Master_Bind: 
50       Last_IO_Error_Timestamp: 
51      Last_SQL_Error_Timestamp: 180721 13:39:43
52                Master_SSL_Crl: 
53            Master_SSL_Crlpath: 
54            Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632
55             Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1,
56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715629
57                 Auto_Position: 1
58          Replicate_Rewrite_DB: 
59                  Channel_Name: 
60            Master_TLS_Version: 
61 1 row in set (0.00 sec)
62 
63 zlm@192.168.56.101:3306 [zlm]>show create table test\G
64 *************************** 1. row ***************************
65        Table: test
66 Create Table: CREATE TABLE `test` (
67   `id` bigint(20) NOT NULL AUTO_INCREMENT,
68   `name` varchar(20) NOT NULL DEFAULT '',
69   PRIMARY KEY (`id`) //The table "test" had been added primary key what hit the error.
70 ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4
71 1 row in set (0.00 sec)

 

Continue to skip error 1068 on master.
1 [root@zlm1 13:39:44 ~]
2 #pt-slave-restart --recurse=0 --error-number=1068 -h192.168.56.101 -P3306 -urepl -prepl4slave
3 2018-07-21T13:50:39 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032         814 1068 
4 2018-07-21T13:50:42 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032        1194 1677 
5 Error 1677 is not in --error-numbers.

 

Check the slave status the fourth times.
 1 zlm@192.168.56.101:3306 [zlm]>show slave status\G
 2 *************************** 1. row ***************************
 3                Slave_IO_State: Waiting for master to send event
 4                   Master_Host: 192.168.56.100
 5                   Master_User: repl
 6                   Master_Port: 3306
 7                 Connect_Retry: 60
 8               Master_Log_File: mysql-bin.000091
 9           Read_Master_Log_Pos: 1241
10                Relay_Log_File: relay-bin.000032
11                 Relay_Log_Pos: 1194
12         Relay_Master_Log_File: mysql-bin.000091
13              Slave_IO_Running: Yes
14             Slave_SQL_Running: No
15               Replicate_Do_DB: 
16           Replicate_Ignore_DB: 
17            Replicate_Do_Table: 
18        Replicate_Ignore_Table: 
19       Replicate_Wild_Do_Table: 
20   Replicate_Wild_Ignore_Table: 
21                    Last_Errno: 1677
22                    Last_Error: Column 1 of table 'zlm.test' cannot be converted from type 'varchar(30(bytes))' to type 'varchar(80(bytes) utf8mb4)'
23                  Skip_Counter: 0
24           Exec_Master_Log_Pos: 981
25               Relay_Log_Space: 1908
26               Until_Condition: None
27                Until_Log_File: 
28                 Until_Log_Pos: 0
29            Master_SSL_Allowed: No
30            Master_SSL_CA_File: 
31            Master_SSL_CA_Path: 
32               Master_SSL_Cert: 
33             Master_SSL_Cipher: 
34                Master_SSL_Key: 
35         Seconds_Behind_Master: NULL
36 Master_SSL_Verify_Server_Cert: No
37                 Last_IO_Errno: 0
38                 Last_IO_Error: 
39                Last_SQL_Errno: 1677
40                Last_SQL_Error: Column 1 of table 'zlm.test' cannot be converted from type 'varchar(30(bytes))' to type 'varchar(80(bytes) utf8mb4)'
41   Replicate_Ignore_Server_Ids: 
42              Master_Server_Id: 1003306
43                   Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e
44              Master_Info_File: /data/mysql/mysql3306/data/master.info
45                     SQL_Delay: 0
46           SQL_Remaining_Delay: NULL
47       Slave_SQL_Running_State: 
48            Master_Retry_Count: 86400
49                   Master_Bind: 
50       Last_IO_Error_Timestamp: 
51      Last_SQL_Error_Timestamp: 180721 13:50:42
52                Master_SSL_Crl: 
53            Master_SSL_Crlpath: 
54            Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632
55             Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1,
56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715631
57                 Auto_Position: 1
58          Replicate_Rewrite_DB: 
59                  Channel_Name: 
60            Master_TLS_Version: 
61 1 row in set (0.00 sec)
62 
63 //Table "test" on master is utf8 menas its column varchar(10) contain 30 bytes
64 //Table "test" on slave is utf8mb4 means its column varchar(20) contain 80 bytes.
65 //They cannot be converted.Thus the error 1677 occurs.

 

Continue to skip error 1677 on master.
1 [root@zlm1 13:50:42 ~]
2 #pt-slave-restart --recurse=0 --error-number=1677 -h192.168.56.101 -P3306 -urepl -prepl4slave
3 2018-07-21T13:59:24 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032        1194 1677 
4 
5 //No more error appears anymore.

 

Check the slave status the fifth times.
 1 zlm@192.168.56.101:3306 [zlm]>show slave status\G
 2 *************************** 1. row ***************************
 3                Slave_IO_State: Waiting for master to send event
 4                   Master_Host: 192.168.56.100
 5                   Master_User: repl
 6                   Master_Port: 3306
 7                 Connect_Retry: 60
 8               Master_Log_File: mysql-bin.000091
 9           Read_Master_Log_Pos: 1241
10                Relay_Log_File: relay-bin.000032
11                 Relay_Log_Pos: 1454
12         Relay_Master_Log_File: mysql-bin.000091
13              Slave_IO_Running: Yes
14             Slave_SQL_Running: Yes //The SQL_Thread is normal now.No errors found.
15               Replicate_Do_DB: 
16           Replicate_Ignore_DB: 
17            Replicate_Do_Table: 
18        Replicate_Ignore_Table: 
19       Replicate_Wild_Do_Table: 
20   Replicate_Wild_Ignore_Table: 
21                    Last_Errno: 0
22                    Last_Error: 
23                  Skip_Counter: 0
24           Exec_Master_Log_Pos: 1241
25               Relay_Log_Space: 1908
26               Until_Condition: None
27                Until_Log_File: 
28                 Until_Log_Pos: 0
29            Master_SSL_Allowed: No
30            Master_SSL_CA_File: 
31            Master_SSL_CA_Path: 
32               Master_SSL_Cert: 
33             Master_SSL_Cipher: 
34                Master_SSL_Key: 
35         Seconds_Behind_Master: 0
36 Master_SSL_Verify_Server_Cert: No
37                 Last_IO_Errno: 0
38                 Last_IO_Error: 
39                Last_SQL_Errno: 0
40                Last_SQL_Error: 
41   Replicate_Ignore_Server_Ids: 
42              Master_Server_Id: 1003306
43                   Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e
44              Master_Info_File: /data/mysql/mysql3306/data/master.info
45                     SQL_Delay: 0
46           SQL_Remaining_Delay: NULL
47       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
48            Master_Retry_Count: 86400
49                   Master_Bind: 
50       Last_IO_Error_Timestamp: 
51      Last_SQL_Error_Timestamp: 
52                Master_SSL_Crl: 
53            Master_SSL_Crlpath: 
54            Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632
55             Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1,
56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715632
57                 Auto_Position: 1
58          Replicate_Rewrite_DB: 
59                  Channel_Name: 
60            Master_TLS_Version: 
61 1 row in set (0.01 sec)
62 
63 //Util now,we've skipped all the errors dynamically.But it doesn't guarantee the consistency of table "test" at all.

 

See what will happen on test of slave.
 1 zlm@192.168.56.101:3306 [zlm]>show create table test\G
 2 *************************** 1. row ***************************
 3        Table: test
 4 Create Table: CREATE TABLE `test` (
 5   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 6   `name` varchar(20) NOT NULL DEFAULT '',
 7   PRIMARY KEY (`id`)
 8 ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4
 9 1 row in set (0.00 sec)
10 
11 zlm@192.168.56.101:3306 [zlm]>select count(*) from test;
12 +----------+
13 | count(*) |
14 +----------+
15 |   100000 |
16 +----------+
17 1 row in set (0.02 sec)
18 
19 //No change happened.

 

Summary
  • pt-slave-restart is tool use to avoid slave replication terminated accidentally due to different errors.
  • On account of "skip-slave-errors" is a static parameter,pt-slave-restart is more convenient to skip errors dynamically.
  • Unless you know the influence or consequences of the skipping operation.Don't skip any errors automatically.It's not safe to keep data consistency.
  • It's not recommended to use parameter "--aways",factitiously check and restart is more reliable.