场景:
昨天业务系统上遇到了数据库慢的问题(对dcsdba.og_file_audit表的insert 慢&超时)、分析后定位到是由于锁等待造成的、分析过程如下:
1、执行show processlist 以此得到一个总体的把握
show processlist;
+--------+--------+----------------------+--------+------------------+--------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+--------+--------+----------------------+--------+------------------+--------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| 231071 | ha_op | 127.0.0.1:38095 | NULL | Sleep | 0 | | NULL |
| 231092 | ha_op | 127.0.0.1:38135 | NULL | Sleep | 0 | | NULL |
| 233790 | repl | 192.168.56.15:38898 | NULL | Binlog Dump GTID | 523702 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 264558 | dcsdba | 192.168.56.44:59175 | NULL | Sleep | 22219 | | NULL |
| 264618 | dcsdba | 192.168.56.44:59563 | NULL | Sleep | 22144 | | NULL |
| 264619 | dcsdba | 192.168.56.44:59599 | dcsdba | Sleep | 35 | | NULL |
| 264620 | dcsdba | 192.168.56.44:59642 | NULL | Sleep | 22133 | | NULL |
| 319312 | dcsdba | 172.17.248.35:64570 | NULL | Sleep | 572 | | NULL |
| 319313 | dcsdba | 172.17.248.35:64571 | NULL | Sleep | 572 | | NULL |
| 319532 | root | 127.0.0.1:49463 | dcsdba | Sleep | 18603 | | NULL |
| 319533 | root | 127.0.0.1:49464 | pubdba | Sleep | 10548 | | NULL |
| 319534 | root | 127.0.0.1:49469 | NULL | Sleep | 9944 | | NULL |
| 319535 | root | 127.0.0.1:49470 | NULL | Sleep | 9944 | | NULL |
| 319806 | dcsdba | 192.168.56.50:54548 | NULL | Sleep | 6743 | | NULL |
| 319808 | dcsdba | 192.168.56.48:54762 | NULL | Sleep | 9038 | | NULL |
| 319812 | dcsdba | 192.168.56.47:45615 | NULL | Sleep | 9031 | | NULL |
| 319814 | dcsdba | 192.168.56.47:59922 | NULL | Sleep | 8567 | | NULL |
| 319816 | dcsdba | 192.168.56.52:43960 | NULL | Sleep | 6744 | | NULL |
| 319818 | dcsdba | 192.168.56.47:39697 | NULL | Sleep | 8470 | | NULL |
| 319819 | dcsdba | 192.168.56.50:38430 | NULL | Sleep | 8610 | | NULL |
| 319820 | dcsdba | 192.168.56.47:53632 | NULL | Sleep | 8459 | | NULL |
| 319822 | dcsdba | 192.168.56.52:43974 | NULL | Sleep | 6744 | | NULL |
| 319825 | dcsdba | 192.168.56.50:38441 | dcsdba | Sleep | 8611 | | NULL |
| 319826 | dcsdba | 192.168.56.50:54563 | NULL | Sleep | 6743 | | NULL |
| 319831 | dcsdba | 192.168.56.48:54774 | dcsdba | Sleep | 9043 | | NULL |
| 319833 | dcsdba | 192.168.56.47:39713 | dcsdba | Sleep | 8470 | | NULL |
| 319834 | dcsdba | 192.168.56.47:59942 | dcsdba | Sleep | 8567 | | NULL |
| 319835 | dcsdba | 192.168.56.47:53647 | NULL | Sleep | 8459 | | NULL |
| 319836 | dcsdba | 192.168.56.47:45640 | dcsdba | Sleep | 9031 | | NULL |
| 319838 | dcsdba | 192.168.56.49:54283 | NULL | Sleep | 8457 | | NULL |
| 319840 | dcsdba | 192.168.56.49:54291 | dcsdba | Sleep | 8457 | | NULL |
| 319841 | dcsdba | 192.168.56.49:54209 | NULL | Sleep | 9431 | | NULL |
| 319842 | dcsdba | 192.168.56.49:54213 | dcsdba | Sleep | 9431 | | NULL |
| 319846 | dcsdba | 192.168.56.50:51212 | NULL | Sleep | 9035 | | NULL |
| 319847 | dcsdba | 192.168.56.52:33242 | NULL | Sleep | 9429 | | NULL |
| 319848 | dcsdba | 192.168.56.47:57585 | NULL | Sleep | 9036 | | NULL |
| 319849 | dcsdba | 192.168.56.52:33246 | dcsdba | Sleep | 9429 | | NULL |
| 319850 | dcsdba | 192.168.56.50:51216 | dcsdba | Sleep | 9035 | | NULL |
| 319851 | dcsdba | 192.168.56.47:57589 | dcsdba | Sleep | 9036 | | NULL |
| 378099 | dcsdba | 192.168.56.51:33024 | NULL | Sleep | 9395 | | NULL |
| 378100 | dcsdba | 192.168.56.51:33025 | dcsdba | Sleep | 9400 | | NULL |
| 380314 | dcsdba | 192.168.56.52:53073 | NULL | Sleep | 9395 | | NULL |
| 380315 | dcsdba | 192.168.56.52:53074 | dcsdba | Sleep | 9395 | | NULL |
| 384041 | dcsdba | 192.168.56.41:51909 | NULL | Sleep | 22517 | | NULL |
| 384042 | dcsdba | 192.168.56.41:51910 | NULL | Sleep | 22517 | | NULL |
| 384058 | dcsdba | 192.168.56.41:51917 | NULL | Sleep | 22500 | | NULL |
| 384059 | dcsdba | 192.168.56.41:51918 | NULL | Sleep | 22500 | | NULL |
| 384418 | dcsdba | 192.168.56.41:51988 | NULL | Sleep | 23149 | | NULL |
| 384419 | dcsdba | 192.168.56.41:51989 | dcsdba | Sleep | 23149 | | NULL |
| 384420 | dcsdba | 192.168.56.41:51993 | NULL | Sleep | 23159 | | NULL |
| 384421 | dcsdba | 192.168.56.41:51994 | dcsdba | Sleep | 23159 | | NULL |
| 384798 | dcsdba | 192.168.56.41:52031 | NULL | Sleep | 21736 | | NULL |
| 384804 | dcsdba | 192.168.56.41:52033 | NULL | Sleep | 21736 | | NULL |
| 386046 | dcsdba | 192.168.56.41:52109 | NULL | Sleep | 22232 | | NULL |
| 386047 | dcsdba | 192.168.56.41:52111 | NULL | Sleep | 22232 | | NULL |
| 420112 | dcsdba | 192.168.56.48:48590 | NULL | Sleep | 9033 | | NULL |
| 420184 | dcsdba | 192.168.56.48:48594 | dcsdba | Sleep | 9033 | | NULL |
| 438815 | dcsdba | 192.168.56.52:39248 | NULL | Sleep | 8609 | | NULL |
| 438831 | dcsdba | 192.168.56.52:39258 | dcsdba | Sleep | 8609 | | NULL |
| 439411 | dcsdba | 192.168.56.50:51134 | NULL | Sleep | 9395 | | NULL |
| 439433 | dcsdba | 192.168.56.50:51140 | dcsdba | Sleep | 9395 | | NULL |
| 444544 | dcsdba | 192.168.56.51:51541 | NULL | Sleep | 8609 | | NULL |
| 444559 | dcsdba | 192.168.56.51:51549 | dcsdba | Sleep | 8609 | | NULL |
| 447213 | dcsdba | 192.168.56.51:44704 | NULL | Sleep | 8466 | | NULL |
| 447219 | dcsdba | 192.168.56.51:44711 | dcsdba | Sleep | 8467 | | NULL |
| 448744 | dcsdba | 172.17.248.73:58210 | NULL | Sleep | 3598 | | NULL |
| 448745 | dcsdba | 172.17.248.73:58211 | NULL | Sleep | 3509 | | NULL |
| 448843 | dcsdba | 192.168.56.44:55427 | dcsdba | Execute | 197 | update | insert into dcsdba.serv_proc_stat(PROC_ID,PROC_TM,SERV_CD,OPERATE_CD,RET_CD,SERV_START_TM,SERV_END_T |
| 449160 | dcsdba | 127.0.0.1:51490 | pubdba | Sleep | 3776 | | NULL |
| 449161 | dcsdba | 127.0.0.1:51491 | NULL | Sleep | 3776 | | NULL |
| 449162 | dcsdba | 127.0.0.1:51492 | dcsdba | Sleep | 3776 | | NULL |
| 449163 | dcsdba | 127.0.0.1:51493 | dcsdba | Sleep | 3776 | | NULL |
| 449226 | dcsdba | 192.168.56.56:42323 | dcsdba | Sleep | 807 | | NULL |
| 449567 | dcsdba | 192.168.56.49:58354 | NULL | Sleep | 1332 | | NULL |
| 449568 | dcsdba | 192.168.56.48:54963 | NULL | Sleep | 118 | | NULL |
| 449572 | dcsdba | 192.168.56.49:58361 | NULL | Execute | 110 | update | insert into dcsdba.og_file_audit ( cust_carrier_cd,partner_carrier_cd,biz_line,biz_pkg,og_file_nm,re |
| 449573 | dcsdba | 192.168.56.48:54969 | NULL | Execute | 118 | update | insert into dcsdba.og_file_audit ( cust_carrier_cd,partner_carrier_cd,biz_line,biz_pkg,og_file_nm,re |
| 449611 | root | 127.0.0.1:51535 | dcsdba | Query | 0 | init | show processlist |
| 449768 | dcsdba | 172.17.248.116:61544 | NULL | Sleep | 1249 | | NULL |
| 449956 | dcsdba | 192.168.56.41:55305 | NULL | Sleep | 555 | | NULL |
| 449958 | dcsdba | 192.168.56.41:55326 | NULL | Execute | 394 | updating | DELETE FROM dcsdba.rap_og_err_fatal WHERE tap_file_nm = 'CDARGTMHKGPP02959' AND (((biz_pkg = |
| 450058 | pubdba | 192.168.56.49:42441 | pubdba | Sleep | 71 | | NULL |
| 450072 | dcsdba | 192.168.56.49:42461 | dcsdba | Sleep | 112 | | NULL |
| 450078 | pubdba | 192.168.56.49:42469 | pubdba | Sleep | 17 | | NULL |
| 450081 | dcsdba | 192.168.56.49:42473 | dcsdba | Sleep | 92 | | NULL |
| 450082 | dcsdba | 192.168.56.49:42474 | dcsdba | Sleep | 72 | | NULL |
| 450085 | pubdba | 192.168.56.49:42479 | pubdba | Sleep | 91 | | NULL |
| 450088 | pubdba | 192.168.56.49:42482 | pubdba | Sleep | 71 | | NULL |
| 450091 | dcsdba | 192.168.56.49:42487 | dcsdba | Sleep | 52 | | NULL |
| 450092 | dcsdba | 192.168.56.49:42488 | dcsdba | Sleep | 32 | | NULL |
| 450095 | pubdba | 192.168.56.49:42491 | pubdba | Sleep | 31 | | NULL |
| 450096 | dcsdba | 192.168.56.49:42493 | dcsdba | Sleep | 32 | | NULL |
| 450097 | dcsdba | 192.168.56.49:42494 | dcsdba | Sleep | 32 | | NULL |
| 450098 | pubdba | 192.168.56.49:42495 | pubdba | Sleep | 31 | | NULL |
| 450099 | pubdba | 192.168.56.49:42496 | pubdba | Sleep | 31 | | NULL |
| 450100 | dcsdba | 192.168.56.49:42499 | dcsdba | Sleep | 12 | | NULL |
| 450101 | dcsdba | 192.168.56.49:42500 | dcsdba | Sleep | 12 | | NULL |
| 450102 | dcsdba | 192.168.56.49:42501 | dcsdba | Sleep | 12 | | NULL |
| 450103 | pubdba | 192.168.56.49:42502 | pubdba | Sleep | 11 | | NULL |
| 450104 | pubdba | 192.168.56.49:42503 | pubdba | Sleep | 11 | | NULL |
| 450105 | pubdba | 192.168.56.49:42504 | pubdba | Sleep | 11 | | NULL |
+--------+--------+----------------------+--------+------------------+--------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
由show processlist的结果中没能看出来MySQL有多繁忙、那为什么会卡住了呢?
2、查看innodb层面有没有锁等待
select * from information_schema.INNODB_LOCK_WAITS;
+-------------------+--------------------------+-----------------+--------------------------+
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
+-------------------+--------------------------+-----------------+--------------------------+
| 343988284 | 343988284:84907:2168:14 | 343987421 | 343987421:84907:2168:14 |
| 343988282 | 343988282:84907:35190:23 | 343987421 | 343987421:84907:35190:23 |
| 343988274 | 343988274:88708:603:78 | 343987421 | 343987421:88708:603:78 |
| 343988262 | 343988262:88356:4:165 | 343987421 | 343987421:88356:4:165 |
+-------------------+--------------------------+-----------------+--------------------------+
哈哈哈、这下有所发现了;原来有四个事务都被 343987421 这一个事务给阻塞了
3、进一步查看锁信息
select * from information_schema.INNODB_LOCKS;
+--------------------------+-------------+-----------+-----------+-------------------------------------------------------+------------+------------+-----------+----------+------------------------------------------------------------------------------------------------------+
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
+--------------------------+-------------+-----------+-----------+-------------------------------------------------------+------------+------------+-----------+----------+------------------------------------------------------------------------------------------------------+
| 343988284:84907:2168:14 | 343988284 | X,GAP | RECORD | `dcsdba`.`og_file_audit` | PRIMARY | 84907 | 2168 | 14 | 'MACCT', 'CDUZB05MACCT01478' |
| 343987421:84907:2168:14 | 343987421 | X | RECORD | `dcsdba`.`og_file_audit` | PRIMARY | 84907 | 2168 | 14 | 'MACCT', 'CDUZB05MACCT01478' |
| 343988282:84907:35190:23 | 343988282 | X,GAP | RECORD | `dcsdba`.`og_file_audit` | PRIMARY | 84907 | 35190 | 23 | 'MACCT', 'CDMACCTINDBI02651' |
| 343987421:84907:35190:23 | 343987421 | X | RECORD | `dcsdba`.`og_file_audit` | PRIMARY | 84907 | 35190 | 23 | 'MACCT', 'CDMACCTINDBI02651' |
| 343988274:88708:603:78 | 343988274 | X,GAP | RECORD | `dcsdba`.`serv_proc_stat` /* Partition `P20161208` */ | PRIMARY | 88708 | 603 | 78 | 'CDPAKPLAREDU06692.gz', 0x999B10D67A, 20161208, 'CdrRating', 'CdrRating', 0x999B110A77, 0x999B110B17 |
| 343987421:88708:603:78 | 343987421 | X | RECORD | `dcsdba`.`serv_proc_stat` /* Partition `P20161208` */ | PRIMARY | 88708 | 603 | 78 | 'CDPAKPLAREDU06692.gz', 0x999B10D67A, 20161208, 'CdrRating', 'CdrRating', 0x999B110A77, 0x999B110B17 |
| 343988262:88356:4:165 | 343988262 | X | RECORD | `dcsdba`.`rap_og_err_fatal` | PRIMARY | 88356 | 4 | 165 | 'RCBGDWTAFGEA00067', 'CDAFGEABGDWT02432', 1, 'BGDWT' |
| 343987421:88356:4:165 | 343987421 | X | RECORD | `dcsdba`.`rap_og_err_fatal` | PRIMARY | 88356 | 4 | 165 | 'RCBGDWTAFGEA00067', 'CDAFGEABGDWT02432', 1, 'BGDWT' |
+--------------------------+-------------+-----------+-----------+-------------------------------------------------------+------------+------------+-----------+----------+------------------------------------------------------------------------------------------------------+
哈哈哈、看到dcsdba.og_file_audit这个表了,说明在这个表上的操作确实是在阻塞中;但是还是没能看到是不是insert 操作、还要进行下一步分析。
4、查看innodb层面正在执行的事务
select * from information_schema.INNODB_TRX;
+-----------+-----------+---------------------+--------------------------+---------------------+------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+
| trx_id | trx_state | trx_started | trx_requested_lock_id | trx_wait_started | trx_weight | trx_mysql_thread_id | trx_query | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_lock_structs | trx_lock_memory_bytes | trx_rows_locked | trx_rows_modified | trx_concurrency_tickets | trx_isolation_level | trx_unique_checks | trx_foreign_key_checks | trx_last_foreign_key_error | trx_adaptive_hash_latched | trx_adaptive_hash_timeout | trx_is_read_only | trx_autocommit_non_locking |
+-----------+-----------+---------------------+--------------------------+---------------------+------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+
| 343988284 | LOCK WAIT | 2016-12-08 19:47:27 | 343988284:84907:2168:14 | 2016-12-08 19:47:27 | 2 | 449572 | insert into dcsdba.og_file_audit ( cust_carrier_cd,partner_carrier_cd,biz_line,biz_pkg,og_file_nm,rec_count, prv_currency_type, currency_type, prv_chg_intax, chg_intax,earliest_start_tm,latest_start_tm,sttl_flag,sttl_dt,db_insr_tm,seq_num,prv_vat,vat ,prv_tax,tax ,prv_chg,chg,sub_count,cksum,tap_decimal,file_size,tap_format,v_cust_carrier_cd,v_partner_carrier_cd,file_create_tm,file_cutoff_tm,file_available_tm,chgval_count,bill_count,rebill_count,usrbill_count,merged_rec_count,total_vat,total_tax,total_chg) values ('MACCT','USAW6',2,1,'CDUSAW6MACCT06342',215,'SDR','SDR',2967200,2967200,str_to_date('','%Y%m%d%H%i%s'),str_to_date('','%Y%m%d%H%i%s'),0,20161208,now(),6342,0,0,0,0,2967200,2967200,33,'',3,66419,6,'MACCT','USAW6',str_to_date('','%Y%m%d%H%i%s'),str_to_date('','%Y%m%d%H%i%s'),str_to_date('','%Y%m%d%H%i%s'),0,0,0,0,0,2967200,0,2967200) | inserting | 1 | 1 | 2 | 360 | 1 | 0 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 9795 | 0 | 0 |
| 343988282 | LOCK WAIT | 2016-12-08 19:47:19 | 343988282:84907:35190:23 | 2016-12-08 19:47:19 | 2 | 449573 | insert into dcsdba.og_file_audit ( cust_carrier_cd,partner_carrier_cd,biz_line,biz_pkg,og_file_nm,rec_count, prv_currency_type, currency_type, prv_chg_intax, chg_intax,earliest_start_tm,latest_start_tm,sttl_flag,sttl_dt,db_insr_tm,seq_num,prv_vat,vat ,prv_tax,tax ,prv_chg,chg,sub_count,cksum,tap_decimal,file_size,tap_format,v_cust_carrier_cd,v_partner_carrier_cd,file_create_tm,file_cutoff_tm,file_available_tm,chgval_count,bill_count,rebill_count,usrbill_count,merged_rec_count,total_vat,total_tax,total_chg) values ('MACCT','INDB1',1,1,'CDMACCTINDB106353',0,'SDR','SDR',0,0,str_to_date('','%Y%m%d%H%i%s'),str_to_date('','%Y%m%d%H%i%s'),0,20161208,now(),6353,0,0,0,0,0,0,0,'',0,123,6,'MACCT','INDB1',str_to_date('','%Y%m%d%H%i%s'),str_to_date('','%Y%m%d%H%i%s'),str_to_date('','%Y%m%d%H%i%s'),0,0,0,0,0,0,0,0) | inserting | 1 | 1 | 2 | 360 | 1 | 0 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 9989 | 0 | 0 |
| 343988274 | LOCK WAIT | 2016-12-08 19:46:00 | 343988274:88708:603:78 | 2016-12-08 19:46:00 | 2 | 448843 | insert into dcsdba.serv_proc_stat(PROC_ID,PROC_TM,SERV_CD,OPERATE_CD,RET_CD,SERV_START_TM,SERV_END_TM,TRANS_FLAG,PRD_CD,PROC_DT,INSTANCE) values('CDPAKPLAREDU06691.gz',STR_TO_DATE('','%Y%m%d%H%i%s'),'DirDetect','doDirDetect',0,STR_TO_DATE('','%Y%m%d%H%i%s'),STR_TO_DATE('','%Y%m%d%H%i%s'),0,'ARCH_DS_CD',20161208,'ic') | inserting | 1 | 1 | 2 | 360 | 1 | 0 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 10000 | 0 | 0 |
| 343988262 | LOCK WAIT | 2016-12-08 19:42:43 | 343988262:88356:4:165 | 2016-12-08 19:42:43 | 2 | 449958 | DELETE FROM dcsdba.rap_og_err_fatal WHERE tap_file_nm = 'CDARGTMHKGPP02959' AND (((biz_pkg = 1 or biz_pkg =3) AND SUBSTR(rap_og_file_nm, 13, 5) > LPAD(75,5,0)) OR (biz_pkg = 2 AND SUBSTR(rap_og_file_nm, 8, 5) > LPAD(75,5,0))) AND biz_line = 2 AND biz_pkg = 1 | starting index read | 1 | 1 | 2 | 360 | 1 | 0 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 0 | 0 | 0 |
| 343987421 | RUNNING | 2016-12-08 18:00:45 | NULL | NULL | 422172 | 448745 | NULL | NULL | 0 | 0 | 412899 | 37238312 | 15725948 | 9273 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 9903 | 0 | 0 |
+-----------+-----------+---------------------+--------------------------+---------------------+------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+
走到这里结论已经非常明显了、事务 343987421 是由 448745 这个session发起的、并且它阻塞了343988284、343988282这两个对表dcsdba.og_file_audit
的插入操作。
不幸的是show processlist 中已经看不到448745这个session的执行的SQL了、幸运的是看到了它的状态是sleep 而且时长已经有3k多秒了、可以推测这个事务
已经完成了只是没有commit 也没有rollback;所以锁还没有释放。后来根据show processlist中的ip地址信息确认这条SQL是由人发起的不是程序代码里的SQL。