resources
- 理解 %IOWAIT (%WIO)
- LINUX系统的CPU使用率和LOAD
- Linux Performance Observability Tools
- How Linux CPU Usage Time and Percentage is calculated
- Linux进程状态
man (on RHEL 7)
# man mpstat
%usr
Show the percentage of CPU utilization that occurred while executing at the user level (application).
%nice
Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
%sys
Show the percentage of CPU utilization that occurred while executing at the system level (kernel).Note that this does not include time spent servicing hardware and software interrupts.
%iowait
Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%irq
Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.
%soft
Show the percentage of time spent by the CPU or CPUs to service software interrupts.
%steal
Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
%guest
Show the percentage of time spent by the CPU or CPUs to run a virtual processor.
%gnice
Show the percentage of time spent by the CPU or CPUs to run a niced guest.
%idle
Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
# man top
us, user : time running un-niced user processes
sy, system : time running kernel processes
ni, nice : time running niced user processes
id, idle : time spent in the kernel idle handler
wa, IO-wait : time waiting for I/O completion
hi : time spent servicing hardware interrupts
si : time spent servicing software interrupts
st : time stolen from this vm by the hypervisor
TIPS
- CPU Usage Time and Percentage
参考 mpstat 手册,%usr + %nice + %sys + %iwoait + %irq + %soft + %steal + %guest + %gnice + %idle = 100%
%steal一般是在虚拟机中才能看到数值,比如CPU overcommitment很严重的VPS,而%guest和%nice一般都很低,
所以也可以根据/proc/stat或者top可得,user + nice + system + idle + iowait + irq + softirq + steal = 100
To calculate Linux CPU usage time subtract the idle CPU time from the total CPU time as follows:
Total CPU time since boot = user + nice + system + idle + iowait + irq + softirq + steal
Total CPU Idle time since boot = idle + iowait
Total CPU usage time since boot = (Total CPU time since boot) - (Total CPU Idle time since boot)
Total CPU percentage = (Total CPU usage time since boot)/(Total CPU time since boot X 100)
- Linux进程状态
运行状态(TASK_RUNNING):
是运行态和就绪态的合并,表示进程正在运行或准备运行,Linux 中使用TASK_RUNNING 宏表示此状态可中断睡眠状态(浅度睡眠)(TASK_INTERRUPTIBLE):
进程正在睡眠(被阻塞),等待资源到来是唤醒,也可以通过其他进程信号或时钟中断唤醒,进入运行队列。Linux 使用TASK_INTERRUPTIBLE 宏表示此状态。不可中断睡眠状态(深度睡眠状态)(TASK_UNINTERRUPTIBLE):
其和浅度睡眠基本类似,但有一点就是不可被其他进程信号或时钟中断唤醒。Linux 使用TASK_UNINTERRUPTIBLE 宏表示此状态。暂停状态(TASK_STOPPED):
进程暂停执行接受某种处理。如正在接受调试的进程处于这种状态,Linux 使用TASK_STOPPED 宏表示此状态。僵死状态(TASK_ZOMBIE):
进程已经结束但未释放PCB,Linux 使用TASK_ZOMBIE 宏表示此状态
- %iowait 的正确认知
%iowait 表示在一个采样周期内有百分之几的时间属于以下情况:CPU空闲、并且有仍未完成的I/O请求。
对 %iowait 常见的误解有两个:
一是误以为 %iowait 表示CPU不能工作的时间,
二是误以为 %iowait 表示I/O有瓶颈。
首先 %iowait 升高并不能证明等待I/O的进程数量增多了,也不能证明等待I/O的总时间增加了。
例如,在CPU繁忙期间发生的I/O,无论IO是多还是少,%iowait都不会变;当CPU繁忙程度下降时,有一部分IO落入CPU空闲时间段内,导致%iowait升高。
再比如,IO的并发度低,%iowait就高;IO的并发度高,%iowait可能就比较低。
可见%iowait是一个非常模糊的指标,如果看到 %iowait 升高,还需检查I/O量有没有明显增加,avserv/avwait/avque等指标有没有明显增大,应用有没有感觉变慢,如果都没有,就没什么好担心的。
- 查看CPU使用率,推荐如下Linux命令:
# top
# sar -u 1 5
# vmstat -n 1 5
# mpstat -P ALL 1 5
- 查看Load的值,推荐如下Linux命令:
# top
# uptime
# sar -q 1 5