Preface
Sometimes,the threads(especially the SQL_Thread) will be terminated by accident.The slave will stop until we fix the relevant error or just skip them.There's many methods we can use to solve these kind of issues.For example,we can set "sql_salve_skip_counter" in mysql client and we can set "slave_skip_errors" in my.cnf as well(static parameter,need to restart mysqld).Further more,set "slave_exec_mode=idempotent" may also be useful.It can skip the "duplicate-key" or "no-key-found" errors.It means that we don't need to pay more attention to error 1062 and 1032 any more if we do so.Is there any other method can solve these skipping error issues?Surely there is one.
Introduce
pt-slave-restart is the one who can skip specific errors and restart the slave automatically.But considering of the consistency of data,we'd better do think twice when skipping errors by it.There're two test below.Let's see how the pt-slave-restart works.
Procedure
Usage
1 pt-slave-restart [OPTIONS] [DSN]
Common parameter
1 --always //Start slaves if there's no error,it will prevent to you stop slave manully. 2 --check-relay-log //Check the relay log file before checking errors on slave. 3 --error-length //Specify the max length to print. 4 --error-number //Specify the number of errors you want to make the slave try to restart. 5 --error-text //Specify the text of errors you want to make the slave try to restart. 6 --max-sleep //Specify the maximum sleep time after it polling the slave. 7 --min-sleep //Specify the minimum sleep time after it polling the slave. 8 --monitor //whether to monitor the slave(default) 9 --master-uuid //Specify the master-uuid you want to skip,if you have multiple writing node. 10 --recurse //If you want to execute pt-slave-restart on master.It's dispensable option. 11 --recurse-method //This option is used together with "--recurse".It specify the method to find slaves. 12 --sleep //Specify the interval of sleep time in checking the slave. 13 --until-master //run until this master logfile,position then exit. 14 --until-relay //run until this relay logfile,position then exit.
Example
1.Test of automatically restart slave.
Execute pt-slave-restart with "--always".
1 [root@zlm1 12:15:35 ~] 2 #pt-slave-restart --recurse=1 --always -h192.168.56.101 -P3306 -urepl -prepl4slave 3 ******************************************************************* 4 Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client 5 is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER 6 together with SSL_ca_file|SSL_ca_path for verification. 7 If you really don't want to verify the certificate and keep the 8 connection open to Man-In-The-Middle attacks please set 9 SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application. 10 ******************************************************************* 11 at /usr/bin/pt-slave-restart line 3841. 12 ******************************************************************* 13 Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client 14 is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER 15 together with SSL_ca_file|SSL_ca_path for verification. 16 If you really don't want to verify the certificate and keep the 17 connection open to Man-In-The-Middle attacks please set 18 SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application. 19 ******************************************************************* 20 at /usr/bin/pt-slave-restart line 3841. 21 22 # A software update is available: 23 2018-07-21T12:18:15 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000030 407 0 24 25 //It has begun to monitor the slave to automatically startup.
Check whether the slave replication is normal.
1 zlm@192.168.56.101:3306 [(none)]>show slave status\G 2 *************************** 1. row *************************** 3 Slave_IO_State: Waiting for master to send event 4 Master_Host: 192.168.56.100 5 Master_User: repl 6 Master_Port: 3306 7 Connect_Retry: 60 8 Master_Log_File: mysql-bin.000091 9 Read_Master_Log_Pos: 194 10 Relay_Log_File: relay-bin.000031 11 Relay_Log_Pos: 407 12 Relay_Master_Log_File: mysql-bin.000091 13 Slave_IO_Running: Yes 14 Slave_SQL_Running: Yes 15 Replicate_Do_DB: 16 Replicate_Ignore_DB: 17 Replicate_Do_Table: 18 Replicate_Ignore_Table: 19 Replicate_Wild_Do_Table: 20 Replicate_Wild_Ignore_Table: 21 Last_Errno: 0 22 Last_Error: 23 Skip_Counter: 0 24 Exec_Master_Log_Pos: 194 25 Relay_Log_Space: 861 26 Until_Condition: None 27 Until_Log_File: 28 Until_Log_Pos: 0 29 Master_SSL_Allowed: No 30 Master_SSL_CA_File: 31 Master_SSL_CA_Path: 32 Master_SSL_Cert: 33 Master_SSL_Cipher: 34 Master_SSL_Key: 35 Seconds_Behind_Master: 0 36 Master_SSL_Verify_Server_Cert: No 37 Last_IO_Errno: 0 38 Last_IO_Error: 39 Last_SQL_Errno: 0 40 Last_SQL_Error: 41 Replicate_Ignore_Server_Ids: 42 Master_Server_Id: 1003306 43 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 44 Master_Info_File: /data/mysql/mysql3306/data/master.info 45 SQL_Delay: 0 46 SQL_Remaining_Delay: NULL 47 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 48 Master_Retry_Count: 86400 49 Master_Bind: 50 Last_IO_Error_Timestamp: 51 Last_SQL_Error_Timestamp: 52 Master_SSL_Crl: 53 Master_SSL_Crlpath: 54 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627 55 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, 56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627 57 Auto_Position: 1 58 Replicate_Rewrite_DB: 59 Channel_Name: 60 Master_TLS_Version: 61 1 row in set (0.00 sec)
Stop slave replication.
1 zlm@192.168.56.101:3306 [(none)]>stop slave; 2 Query OK, 0 rows affected (0.03 sec) 3 4 zlm@192.168.56.101:3306 [(none)]>show slave status\G 5 *************************** 1. row *************************** 6 Slave_IO_State: 7 Master_Host: 192.168.56.100 8 Master_User: repl 9 Master_Port: 3306 10 Connect_Retry: 60 11 Master_Log_File: mysql-bin.000091 12 Read_Master_Log_Pos: 194 13 Relay_Log_File: relay-bin.000031 14 Relay_Log_Pos: 407 15 Relay_Master_Log_File: mysql-bin.000091 16 Slave_IO_Running: No //IO_Thread has stopped. 17 Slave_SQL_Running: No //SQL_Thread has stopped. 18 Replicate_Do_DB: 19 Replicate_Ignore_DB: 20 Replicate_Do_Table: 21 Replicate_Ignore_Table: 22 Replicate_Wild_Do_Table: 23 Replicate_Wild_Ignore_Table: 24 Last_Errno: 0 25 Last_Error: 26 Skip_Counter: 0 27 Exec_Master_Log_Pos: 194 28 Relay_Log_Space: 861 29 Until_Condition: None 30 Until_Log_File: 31 Until_Log_Pos: 0 32 Master_SSL_Allowed: No 33 Master_SSL_CA_File: 34 Master_SSL_CA_Path: 35 Master_SSL_Cert: 36 Master_SSL_Cipher: 37 Master_SSL_Key: 38 Seconds_Behind_Master: NULL 39 Master_SSL_Verify_Server_Cert: No 40 Last_IO_Errno: 0 41 Last_IO_Error: 42 Last_SQL_Errno: 0 43 Last_SQL_Error: 44 Replicate_Ignore_Server_Ids: 45 Master_Server_Id: 1003306 46 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 47 Master_Info_File: /data/mysql/mysql3306/data/master.info 48 SQL_Delay: 0 49 SQL_Remaining_Delay: NULL 50 Slave_SQL_Running_State: 51 Master_Retry_Count: 86400 52 Master_Bind: 53 Last_IO_Error_Timestamp: 54 Last_SQL_Error_Timestamp: 55 Master_SSL_Crl: 56 Master_SSL_Crlpath: 57 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627 58 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, 59 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627 60 Auto_Position: 1 61 Replicate_Rewrite_DB: 62 Channel_Name: 63 Master_TLS_Version: 64 1 row in set (0.00 sec) 65 66 //Wait for a while and check it again. 67 68 zlm@192.168.56.101:3306 [(none)]>show slave status\G 69 *************************** 1. row *************************** 70 Slave_IO_State: Waiting for master to send event 71 Master_Host: 192.168.56.100 72 Master_User: repl 73 Master_Port: 3306 74 Connect_Retry: 60 75 Master_Log_File: mysql-bin.000091 76 Read_Master_Log_Pos: 194 77 Relay_Log_File: relay-bin.000032 78 Relay_Log_Pos: 407 79 Relay_Master_Log_File: mysql-bin.000091 80 Slave_IO_Running: Yes //IO_Thread has started again. 81 Slave_SQL_Running: Yes //SQL_Thread has started again. 82 Replicate_Do_DB: 83 Replicate_Ignore_DB: 84 Replicate_Do_Table: 85 Replicate_Ignore_Table: 86 Replicate_Wild_Do_Table: 87 Replicate_Wild_Ignore_Table: 88 Last_Errno: 0 89 Last_Error: 90 Skip_Counter: 0 91 Exec_Master_Log_Pos: 194 92 Relay_Log_Space: 861 93 Until_Condition: None 94 Until_Log_File: 95 Until_Log_Pos: 0 96 Master_SSL_Allowed: No 97 Master_SSL_CA_File: 98 Master_SSL_CA_Path: 99 Master_SSL_Cert: 100 Master_SSL_Cipher: 101 Master_SSL_Key: 102 Seconds_Behind_Master: 0 103 Master_SSL_Verify_Server_Cert: No 104 Last_IO_Errno: 0 105 Last_IO_Error: 106 Last_SQL_Errno: 0 107 Last_SQL_Error: 108 Replicate_Ignore_Server_Ids: 109 Master_Server_Id: 1003306 110 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 111 Master_Info_File: /data/mysql/mysql3306/data/master.info 112 SQL_Delay: 0 113 SQL_Remaining_Delay: NULL 114 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 115 Master_Retry_Count: 86400 116 Master_Bind: 117 Last_IO_Error_Timestamp: 118 Last_SQL_Error_Timestamp: 119 Master_SSL_Crl: 120 Master_SSL_Crlpath: 121 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627 122 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, 123 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627 124 Auto_Position: 1 125 Replicate_Rewrite_DB: 126 Channel_Name: 127 Master_TLS_Version: 128 1 row in set (0.00 sec)
The output is added one line which shows the connection status of slave on master.
1 # A software update is available: 2 2018-07-21T12:18:15 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000030 407 0 3 2018-07-21T12:21:27 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000031 407 0
Check the general log of slave to see the details.
1 [root@zlm2 12:21:46 /data/mysql/mysql3306/data] 2 #tail -f zlm2.log 3 mysqld, Version: 5.7.21-log (MySQL Community Server (GPL)). started with: 4 Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock 5 Time Id Command Argument 6 2018-07-21T10:21:14.049221Z 30 Query stop slave 7 2018-07-21T10:21:21.598248Z 30 Query show slave status 8 2018-07-21T10:21:27.080665Z 31 Query SHOW SLAVE STATUS 9 2018-07-21T10:21:27.081848Z 31 Query START SLAVE 10 2018-07-21T10:21:27.084221Z 36 Connect Out repl@192.168.56.100:3306 11 2018-07-21T10:21:28.086327Z 31 Query SHOW SLAVE STATUS 12 2018-07-21T10:21:46.570310Z 35 Quit 13 2018-07-21T10:22:02.782934Z 38 Connect zlm@zlm2 on using TCP/IP 14 2018-07-21T10:22:02.784208Z 38 Query select @@version_comment limit 1 15 2018-07-21T10:22:02.785102Z 38 Query select USER() 16 2018-07-21T10:22:10.440909Z 38 Query show slave status 17 2018-07-21T10:22:14.535233Z 38 Query show slave status 18 2018-07-21T10:22:32.087271Z 31 Query SHOW SLAVE STATUS 19 ...
2.Test of skipping error of replication.
Do some operations on master.
1 zlm@192.168.56.100:3306 [zlm]>show tables; 2 +---------------+ 3 | Tables_in_zlm | 4 +---------------+ 5 | test_gbk | 6 | test_utf8 | 7 | test_utf8mb4 | 8 +---------------+ 9 3 rows in set (0.00 sec) 10 11 zlm@192.168.56.100:3306 [zlm]>drop table test_gbk,test_utf8,test_utf8mb4; 12 Query OK, 0 rows affected (0.04 sec) 13 14 zlm@192.168.56.100:3306 [zlm]>show tables; 15 Empty set (0.00 sec) 16 17 zlm@192.168.56.100:3306 [zlm]>create table test( 18 -> id int, 19 -> name varchar(10) 20 -> ) engine=innodb; 21 Query OK, 0 rows affected (0.03 sec) 22 23 zlm@192.168.56.100:3306 [zlm]>alter table test add primary key(id); 24 Query OK, 0 rows affected (0.01 sec) 25 Records: 0 Duplicates: 0 Warnings: 0 26 27 zlm@192.168.56.100:3306 [zlm]>alter table test modify column id int unsigned not null auto_increment; 28 Query OK, 0 rows affected (0.02 sec) 29 Records: 0 Duplicates: 0 Warnings: 0 30 31 zlm@192.168.56.100:3306 [zlm]>show create table test\G 32 *************************** 1. row *************************** 33 Table: test 34 Create Table: CREATE TABLE `test` ( 35 `id` int(10) unsigned NOT NULL AUTO_INCREMENT, 36 `name` varchar(10) DEFAULT NULL, 37 PRIMARY KEY (`id`) 38 ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 39 1 row in set (0.00 sec) 40 41 zlm@192.168.56.100:3306 [zlm]>insert into test values(1,'zlm'); 42 Query OK, 1 row affected (0.00 sec) 43 44 zlm@192.168.56.100:3306 [zlm]>select * from test; 45 +----+------+ 46 | id | name | 47 +----+------+ 48 | 1 | zlm | 49 +----+------+ 50 1 row in set (0.00 sec)
Check the slave status on slave.
1 zlm@192.168.56.101:3306 [zlm]>show slave status\G 2 *************************** 1. row *************************** 3 Slave_IO_State: Waiting for master to send event 4 Master_Host: 192.168.56.100 5 Master_User: repl 6 Master_Port: 3306 7 Connect_Retry: 60 8 Master_Log_File: mysql-bin.000091 9 Read_Master_Log_Pos: 1241 10 Relay_Log_File: relay-bin.000032 11 Relay_Log_Pos: 407 12 Relay_Master_Log_File: mysql-bin.000091 13 Slave_IO_Running: Yes 14 Slave_SQL_Running: No 15 Replicate_Do_DB: 16 Replicate_Ignore_DB: 17 Replicate_Do_Table: 18 Replicate_Ignore_Table: 19 Replicate_Wild_Do_Table: 20 Replicate_Wild_Ignore_Table: 21 Last_Errno: 1051 22 Last_Error: Error 'Unknown table 'zlm.test_gbk,zlm.test_utf8,zlm.test_utf8mb4'' on query. Default database: 'zlm'. Query: 'DROP TABLE `test_gbk`,`test_utf8`,`test_utf8mb4` /* generated by server */' 23 Skip_Counter: 0 24 Exec_Master_Log_Pos: 194 25 Relay_Log_Space: 1908 26 Until_Condition: None 27 Until_Log_File: 28 Until_Log_Pos: 0 29 Master_SSL_Allowed: No 30 Master_SSL_CA_File: 31 Master_SSL_CA_Path: 32 Master_SSL_Cert: 33 Master_SSL_Cipher: 34 Master_SSL_Key: 35 Seconds_Behind_Master: NULL 36 Master_SSL_Verify_Server_Cert: No 37 Last_IO_Errno: 0 38 Last_IO_Error: 39 Last_SQL_Errno: 1051 40 Last_SQL_Error: Error 'Unknown table 'zlm.test_gbk,zlm.test_utf8,zlm.test_utf8mb4'' on query. Default database: 'zlm'. Query: 'DROP TABLE `test_gbk`,`test_utf8`,`test_utf8mb4` /* generated by server */' 41 Replicate_Ignore_Server_Ids: 42 Master_Server_Id: 1003306 43 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 44 Master_Info_File: /data/mysql/mysql3306/data/master.info 45 SQL_Delay: 0 46 SQL_Remaining_Delay: NULL 47 Slave_SQL_Running_State: 48 Master_Retry_Count: 86400 49 Master_Bind: 50 Last_IO_Error_Timestamp: 51 Last_SQL_Error_Timestamp: 180721 13:19:25 52 Master_SSL_Crl: 53 Master_SSL_Crlpath: 54 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632 55 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, 56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627 57 Auto_Position: 1 58 Replicate_Rewrite_DB: 59 Channel_Name: 60 Master_TLS_Version: 61 1 row in set (0.00 sec) 62 63 zlm@192.168.56.101:3306 [zlm]>show tables; 64 +----------------+ 65 | Tables_in_zlm | 66 +----------------+ 67 | semi_sync_test | 68 | t1 | 69 | t2 | 70 | t3 | 71 | test | 72 | test_flashbk | 73 +----------------+ 74 6 rows in set (0.00 sec) 75 76 //There're no target tables at all.So error 1051 occurs.
Try to skip the error 1051 on master.
1 [root@zlm1 13:18:45 ~] 2 #pt-slave-restart --recurse=0 --error-number=1051 -h192.168.56.101 -P3306 -urepl -prepl4slave 3 2018-07-21T13:35:31 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032 407 1051 4 2018-07-21T13:35:31 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032 618 1050 5 Error 1050 is not in --error-numbers. //error 1050 occurs then.
Check replicaiton slave status again.
1 zlm@192.168.56.101:3306 [zlm]>show slave status\G 2 *************************** 1. row *************************** 3 Slave_IO_State: Waiting for master to send event 4 Master_Host: 192.168.56.100 5 Master_User: repl 6 Master_Port: 3306 7 Connect_Retry: 60 8 Master_Log_File: mysql-bin.000091 9 Read_Master_Log_Pos: 1241 10 Relay_Log_File: relay-bin.000032 11 Relay_Log_Pos: 618 12 Relay_Master_Log_File: mysql-bin.000091 13 Slave_IO_Running: Yes 14 Slave_SQL_Running: No 15 Replicate_Do_DB: 16 Replicate_Ignore_DB: 17 Replicate_Do_Table: 18 Replicate_Ignore_Table: 19 Replicate_Wild_Do_Table: 20 Replicate_Wild_Ignore_Table: 21 Last_Errno: 1050 22 Last_Error: Error 'Table 'test' already exists' on query. Default database: 'zlm'. Query: 'create table test( 23 id int, 24 name varchar(10) 25 ) engine=innodb' 26 Skip_Counter: 0 27 Exec_Master_Log_Pos: 405 28 Relay_Log_Space: 1908 29 Until_Condition: None 30 Until_Log_File: 31 Until_Log_Pos: 0 32 Master_SSL_Allowed: No 33 Master_SSL_CA_File: 34 Master_SSL_CA_Path: 35 Master_SSL_Cert: 36 Master_SSL_Cipher: 37 Master_SSL_Key: 38 Seconds_Behind_Master: NULL 39 Master_SSL_Verify_Server_Cert: No 40 Last_IO_Errno: 0 41 Last_IO_Error: 42 Last_SQL_Errno: 1050 43 Last_SQL_Error: Error 'Table 'test' already exists' on query. Default database: 'zlm'. Query: 'create table test( 44 id int, 45 name varchar(10) 46 ) engine=innodb' 47 Replicate_Ignore_Server_Ids: 48 Master_Server_Id: 1003306 49 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 50 Master_Info_File: /data/mysql/mysql3306/data/master.info 51 SQL_Delay: 0 52 SQL_Remaining_Delay: NULL 53 Slave_SQL_Running_State: 54 Master_Retry_Count: 86400 55 Master_Bind: 56 Last_IO_Error_Timestamp: 57 Last_SQL_Error_Timestamp: 180721 13:35:31 58 Master_SSL_Crl: 59 Master_SSL_Crlpath: 60 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632 61 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, 62 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715628 63 Auto_Position: 1 64 Replicate_Rewrite_DB: 65 Channel_Name: 66 Master_TLS_Version: 67 1 row in set (0.00 sec)
Continue to skip error 1050 on master.
1 [root@zlm1 13:35:31 ~] 2 #pt-slave-restart --recurse=0 --error-number=1050 -h192.168.56.101 -P3306 -urepl -prepl4slave 3 2018-07-21T13:39:43 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032 618 1050 4 2018-07-21T13:39:44 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032 814 1068 5 Error 1068 is not in --error-numbers.
Check the slave status one more time.
1 zlm@192.168.56.101:3306 [zlm]>show slave status\G 2 *************************** 1. row *************************** 3 Slave_IO_State: Waiting for master to send event 4 Master_Host: 192.168.56.100 5 Master_User: repl 6 Master_Port: 3306 7 Connect_Retry: 60 8 Master_Log_File: mysql-bin.000091 9 Read_Master_Log_Pos: 1241 10 Relay_Log_File: relay-bin.000032 11 Relay_Log_Pos: 814 12 Relay_Master_Log_File: mysql-bin.000091 13 Slave_IO_Running: Yes 14 Slave_SQL_Running: No 15 Replicate_Do_DB: 16 Replicate_Ignore_DB: 17 Replicate_Do_Table: 18 Replicate_Ignore_Table: 19 Replicate_Wild_Do_Table: 20 Replicate_Wild_Ignore_Table: 21 Last_Errno: 1068 22 Last_Error: Error 'Multiple primary key defined' on query. Default database: 'zlm'. Query: 'alter table test add primary key(id)' 23 Skip_Counter: 0 24 Exec_Master_Log_Pos: 601 25 Relay_Log_Space: 1908 26 Until_Condition: None 27 Until_Log_File: 28 Until_Log_Pos: 0 29 Master_SSL_Allowed: No 30 Master_SSL_CA_File: 31 Master_SSL_CA_Path: 32 Master_SSL_Cert: 33 Master_SSL_Cipher: 34 Master_SSL_Key: 35 Seconds_Behind_Master: NULL 36 Master_SSL_Verify_Server_Cert: No 37 Last_IO_Errno: 0 38 Last_IO_Error: 39 Last_SQL_Errno: 1068 40 Last_SQL_Error: Error 'Multiple primary key defined' on query. Default database: 'zlm'. Query: 'alter table test add primary key(id)' 41 Replicate_Ignore_Server_Ids: 42 Master_Server_Id: 1003306 43 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 44 Master_Info_File: /data/mysql/mysql3306/data/master.info 45 SQL_Delay: 0 46 SQL_Remaining_Delay: NULL 47 Slave_SQL_Running_State: 48 Master_Retry_Count: 86400 49 Master_Bind: 50 Last_IO_Error_Timestamp: 51 Last_SQL_Error_Timestamp: 180721 13:39:43 52 Master_SSL_Crl: 53 Master_SSL_Crlpath: 54 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632 55 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, 56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715629 57 Auto_Position: 1 58 Replicate_Rewrite_DB: 59 Channel_Name: 60 Master_TLS_Version: 61 1 row in set (0.00 sec) 62 63 zlm@192.168.56.101:3306 [zlm]>show create table test\G 64 *************************** 1. row *************************** 65 Table: test 66 Create Table: CREATE TABLE `test` ( 67 `id` bigint(20) NOT NULL AUTO_INCREMENT, 68 `name` varchar(20) NOT NULL DEFAULT '', 69 PRIMARY KEY (`id`) //The table "test" had been added primary key what hit the error. 70 ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4 71 1 row in set (0.00 sec)
Continue to skip error 1068 on master.
1 [root@zlm1 13:39:44 ~] 2 #pt-slave-restart --recurse=0 --error-number=1068 -h192.168.56.101 -P3306 -urepl -prepl4slave 3 2018-07-21T13:50:39 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032 814 1068 4 2018-07-21T13:50:42 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032 1194 1677 5 Error 1677 is not in --error-numbers.
Check the slave status the fourth times.
1 zlm@192.168.56.101:3306 [zlm]>show slave status\G 2 *************************** 1. row *************************** 3 Slave_IO_State: Waiting for master to send event 4 Master_Host: 192.168.56.100 5 Master_User: repl 6 Master_Port: 3306 7 Connect_Retry: 60 8 Master_Log_File: mysql-bin.000091 9 Read_Master_Log_Pos: 1241 10 Relay_Log_File: relay-bin.000032 11 Relay_Log_Pos: 1194 12 Relay_Master_Log_File: mysql-bin.000091 13 Slave_IO_Running: Yes 14 Slave_SQL_Running: No 15 Replicate_Do_DB: 16 Replicate_Ignore_DB: 17 Replicate_Do_Table: 18 Replicate_Ignore_Table: 19 Replicate_Wild_Do_Table: 20 Replicate_Wild_Ignore_Table: 21 Last_Errno: 1677 22 Last_Error: Column 1 of table 'zlm.test' cannot be converted from type 'varchar(30(bytes))' to type 'varchar(80(bytes) utf8mb4)' 23 Skip_Counter: 0 24 Exec_Master_Log_Pos: 981 25 Relay_Log_Space: 1908 26 Until_Condition: None 27 Until_Log_File: 28 Until_Log_Pos: 0 29 Master_SSL_Allowed: No 30 Master_SSL_CA_File: 31 Master_SSL_CA_Path: 32 Master_SSL_Cert: 33 Master_SSL_Cipher: 34 Master_SSL_Key: 35 Seconds_Behind_Master: NULL 36 Master_SSL_Verify_Server_Cert: No 37 Last_IO_Errno: 0 38 Last_IO_Error: 39 Last_SQL_Errno: 1677 40 Last_SQL_Error: Column 1 of table 'zlm.test' cannot be converted from type 'varchar(30(bytes))' to type 'varchar(80(bytes) utf8mb4)' 41 Replicate_Ignore_Server_Ids: 42 Master_Server_Id: 1003306 43 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 44 Master_Info_File: /data/mysql/mysql3306/data/master.info 45 SQL_Delay: 0 46 SQL_Remaining_Delay: NULL 47 Slave_SQL_Running_State: 48 Master_Retry_Count: 86400 49 Master_Bind: 50 Last_IO_Error_Timestamp: 51 Last_SQL_Error_Timestamp: 180721 13:50:42 52 Master_SSL_Crl: 53 Master_SSL_Crlpath: 54 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632 55 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, 56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715631 57 Auto_Position: 1 58 Replicate_Rewrite_DB: 59 Channel_Name: 60 Master_TLS_Version: 61 1 row in set (0.00 sec) 62 63 //Table "test" on master is utf8 menas its column varchar(10) contain 30 bytes 64 //Table "test" on slave is utf8mb4 means its column varchar(20) contain 80 bytes. 65 //They cannot be converted.Thus the error 1677 occurs.
Continue to skip error 1677 on master.
1 [root@zlm1 13:50:42 ~] 2 #pt-slave-restart --recurse=0 --error-number=1677 -h192.168.56.101 -P3306 -urepl -prepl4slave 3 2018-07-21T13:59:24 P=3306,h=192.168.56.101,p=...,u=repl relay-bin.000032 1194 1677 4 5 //No more error appears anymore.
Check the slave status the fifth times.
1 zlm@192.168.56.101:3306 [zlm]>show slave status\G 2 *************************** 1. row *************************** 3 Slave_IO_State: Waiting for master to send event 4 Master_Host: 192.168.56.100 5 Master_User: repl 6 Master_Port: 3306 7 Connect_Retry: 60 8 Master_Log_File: mysql-bin.000091 9 Read_Master_Log_Pos: 1241 10 Relay_Log_File: relay-bin.000032 11 Relay_Log_Pos: 1454 12 Relay_Master_Log_File: mysql-bin.000091 13 Slave_IO_Running: Yes 14 Slave_SQL_Running: Yes //The SQL_Thread is normal now.No errors found. 15 Replicate_Do_DB: 16 Replicate_Ignore_DB: 17 Replicate_Do_Table: 18 Replicate_Ignore_Table: 19 Replicate_Wild_Do_Table: 20 Replicate_Wild_Ignore_Table: 21 Last_Errno: 0 22 Last_Error: 23 Skip_Counter: 0 24 Exec_Master_Log_Pos: 1241 25 Relay_Log_Space: 1908 26 Until_Condition: None 27 Until_Log_File: 28 Until_Log_Pos: 0 29 Master_SSL_Allowed: No 30 Master_SSL_CA_File: 31 Master_SSL_CA_Path: 32 Master_SSL_Cert: 33 Master_SSL_Cipher: 34 Master_SSL_Key: 35 Seconds_Behind_Master: 0 36 Master_SSL_Verify_Server_Cert: No 37 Last_IO_Errno: 0 38 Last_IO_Error: 39 Last_SQL_Errno: 0 40 Last_SQL_Error: 41 Replicate_Ignore_Server_Ids: 42 Master_Server_Id: 1003306 43 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 44 Master_Info_File: /data/mysql/mysql3306/data/master.info 45 SQL_Delay: 0 46 SQL_Remaining_Delay: NULL 47 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 48 Master_Retry_Count: 86400 49 Master_Bind: 50 Last_IO_Error_Timestamp: 51 Last_SQL_Error_Timestamp: 52 Master_SSL_Crl: 53 Master_SSL_Crlpath: 54 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627-12715632 55 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, 56 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715632 57 Auto_Position: 1 58 Replicate_Rewrite_DB: 59 Channel_Name: 60 Master_TLS_Version: 61 1 row in set (0.01 sec) 62 63 //Util now,we've skipped all the errors dynamically.But it doesn't guarantee the consistency of table "test" at all.
See what will happen on test of slave.
1 zlm@192.168.56.101:3306 [zlm]>show create table test\G 2 *************************** 1. row *************************** 3 Table: test 4 Create Table: CREATE TABLE `test` ( 5 `id` int(10) unsigned NOT NULL AUTO_INCREMENT, 6 `name` varchar(20) NOT NULL DEFAULT '', 7 PRIMARY KEY (`id`) 8 ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4 9 1 row in set (0.00 sec) 10 11 zlm@192.168.56.101:3306 [zlm]>select count(*) from test; 12 +----------+ 13 | count(*) | 14 +----------+ 15 | 100000 | 16 +----------+ 17 1 row in set (0.02 sec) 18 19 //No change happened.
Summary
- pt-slave-restart is tool use to avoid slave replication terminated accidentally due to different errors.
- On account of "skip-slave-errors" is a static parameter,pt-slave-restart is more convenient to skip errors dynamically.
- Unless you know the influence or consequences of the skipping operation.Don't skip any errors automatically.It's not safe to keep data consistency.
- It's not recommended to use parameter "--aways",factitiously check and restart is more reliable.