1、首先通过top命令查看占用cpu过高的pid
#top
top - 18:07:25 up 48 days, 1:07, 3 users, load average: 11.94, 11.90, 9.46
Tasks: 271 total, 1 running, 270 sleeping, 0 stopped, 0 zombie
%Cpu(s): 74.2 us, 0.8 sy, 0.0 ni, 24.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.2 st
KiB Mem : 65808884 total, 5901708 free, 46771732 used, 13135444 buff/cache
KiB Swap: 524284 total, 129448 free, 394836 used. 18397044 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
444 root 20 0 20.181g 0.011t 28220 S 1181 17.2 207:46.21 java
3631 root 20 0 19.415g 1.443g 28184 S 19.6 2.3 1:42.24 java
24528 root 20 0 10.739g 671532 27128 S 3.7 1.0 8:58.51 java
16952 root 20 0 20.416g 340340 6496 S 3.3 0.5 158:35.14 java
15584 root 20 0 144036 4036 2628 S 1.3 0.0 13:31.14 sshd
4717 root 20 0 2942716 1.188g 7652 S 1.0 1.9 761:57.40 java
25 root 20 0 0 0 0 S 0.3 0.0 84:26.67 rcu_sched
32 root 20 0 0 0 0 S 0.3 0.0 6:25.61 rcuos/6
2759 root 20 0 20.209g 1.985g 28308 S 0.3 3.2 1:46.17 java
3474 nginx 20 0 124032 3680 452 S 0.3 0.0 1:49.80 nginx
pid为444,cpu使用率达到了1181%
2、排查线程
使用命令:ps -mp pid -o THREAD,tid,time pid就是上面找出的444
#ps -mp 444 -o THREAD,tid,time
USER %CPU PRI SCNT WCHAN USER SYSTEM TID TIME
root 792 - - - - - - 01:18:40
root 0.0 19 - futex_ - - 444 00:00:00
root 4.0 19 - futex_ - - 480 00:00:24
root 58.5 19 - - - - 481 00:05:48
root 58.6 19 - - - - 482 00:05:49
root 58.5 19 - - - - 483 00:05:48
root 58.6 19 - - - - 484 00:05:49
root 58.5 19 - - - - 485 00:05:49
root 58.6 19 - - - - 486 00:05:49
root 58.5 19 - - - - 487 00:05:49
root 58.5 19 - - - - 488 00:05:49
root 58.6 19 - - - - 489 00:05:49
root 58.9 19 - - - - 490 00:05:51
root 58.5 19 - - - - 491 00:05:48
root 58.5 19 - - - - 492 00:05:48
root 58.6 19 - - - - 493 00:05:49
root 3.4 19 - futex_ - - 494 00:00:20
root 0.0 19 - futex_ - - 495 00:00:00
root 0.0 19 - futex_ - - 496 00:00:00
root 0.0 19 - futex_ - - 498 00:00:00
root 1.2 19 - futex_ - - 499 00:00:07
root 1.5 19 - futex_ - - 500 00:00:09
root 1.2 19 - futex_ - - 501 00:00:07
root 1.0 19 - futex_ - - 502 00:00:06
root 1.3 19 - futex_ - - 503 00:00:07
root 1.1 19 - futex_ - - 505 00:00:06
红色区,很明显创建的线程没有执行,不断的进行GC,导致cpu飙高。找到红色区问题线程了。
3、查看问题线程堆栈
将问题线程的tid转换成16进制:
#printf "%x\n" 485
1e5
jstack查看线程堆栈信息
#jstack pid | grep tid
#jstack 444 | grep 1e5
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f87d402b800 nid=0x1e5 runnable
jstat查看进程内存状况
jstat -gcutil 444 2000 10
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 0.00 100.00 99.99 97.42 95.76 23 40.676 119 430.327 471.004
0.00 0.00 100.00 99.99 97.42 95.76 23 40.676 119 430.327 471.004
0.00 0.00 100.00 99.99 97.43 95.77 23 40.676 120 434.709 475.385
0.00 0.00 100.00 99.99 97.43 95.77 23 40.676 120 434.709 475.385
0.00 0.00 100.00 99.99 97.43 95.77 23 40.676 120 434.709 475.385
0.00 0.00 100.00 99.99 97.45 95.81 23 40.676 121 439.928 480.604
0.00 0.00 100.00 99.99 97.46 95.81 23 40.676 122 442.248 482.924
0.00 0.00 100.00 99.99 97.46 95.81 23 40.676 122 442.248 482.924
0.00 0.00 100.00 99.99 97.46 95.81 23 40.676 122 442.248 482.924
0.00 0.00 100.00 99.99 97.46 95.82 23 40.676 123 447.374 488.051
E表示新生代内存使用率,O表示老生代内存使用率