一 问题
Dubbo monitor所在服务器状态异常,iowait一直很高,load也一直很高,监控如下:
iowait如图:
load如图:
二 分析
通过iotop命令可以查看当前系统中磁盘io情况以及进程占用磁盘io的情况
从中可以定位到占用io进程的pid;
通过
cat /proc/${pid}/io
可以查看一个进程具体的读写状况;
通过
ps aux|grep ${pid}
可以查到这个进程具体的命令;
通过以上命令定位到进程为dubbo的monitor进程,用jstack打印线程栈发现处于RUNNABLE的进程除了
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
之外,有两个线程很可疑
"DubboMonitorTimer-thread-1" daemon prio=10 tid=0x00007f53b0593000 nid=0x363e runnable [0x00007f53ac36d000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
- locked <0x0000000779d506f8> (a java.io.FileReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
- locked <0x0000000779d506f8> (a java.io.FileReader)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at com.alibaba.dubbo.monitor.simple.SimpleMonitorService.appendData(SimpleMonitorService.java:322)
at com.alibaba.dubbo.monitor.simple.SimpleMonitorService.draw(SimpleMonitorService.java:263)
at com.alibaba.dubbo.monitor.simple.SimpleMonitorService.access$300(SimpleMonitorService.java:64)
at com.alibaba.dubbo.monitor.simple.SimpleMonitorService$2.run(SimpleMonitorService.java:137)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Locked ownable synchronizers:
- <0x0000000788104268> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"DubboMonitorAsyncWriteLogThread" daemon prio=10 tid=0x00007f53b05ba000 nid=0x363d runnable [0x00007f53ac3af000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
- locked <0x000000077a028918> (a java.io.FileWriter)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at com.alibaba.dubbo.monitor.simple.SimpleMonitorService.write(SimpleMonitorService.java:213)
at com.alibaba.dubbo.monitor.simple.SimpleMonitorService.access$100(SimpleMonitorService.java:64)
at com.alibaba.dubbo.monitor.simple.SimpleMonitorService$1.run(SimpleMonitorService.java:120)
at java.lang.Thread.run(Thread.java:744)
Locked ownable synchronizers:
- None
这两个线程一个在read,一个在write,查看dubbo monitor的源代码发现:
monitor中会有两个线程,一个不停地写统计日志,一个每隔300s进行一次draw操作;
monitor中每天的日志都会保存在一个日志目录中,进入其中一天的日志目录查看日志大小:
发现一天的日志只有几十M,而上边iotop查看的结果是read是1520K/s,write是3.43M/s,所以排除write的问题,问题应该出在draw身上,查看draw的源代码发现draw会不断的读取统计日志同时进行绘图操作,
draw中会遍历日志目录,同时在appendData方法中会进行日志读取操作,由于monitor已经运行很长时间,所以按天保存的目录非常多,每次draw都需要去遍历这些目录并且进行读取文件操作,历史数据没有改动但却会不断的重新进行日志读取和绘图,问题应该出在这里;
三 解决
修改代码如下
即让draw只对当天的日志进行处理,修改之后服务器iowait如图:
load如图:
问题解决