1、linux性能数据来源
1.1、cpu实时性能数据 /proc/stat
通过/proc/stat中可以获取系统的实时的统计信息,包括统计cpu在各个阶段的执行时间,软硬件中断的次数,cpu上下文切换次数,以及进程信息,其内容参见下图
1、cpu
The amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value), that the system spent in various states:
解释:
大致是说这些数据都是以clock_tick为单位记录的,这里的clock_tick在绝大多数机器上是1/100秒即10ms。对于特定的机器可以通过sysconf(_SC_CLK_TCK)函数来获得clock_tick(ps: 这个函数属于linux内核中unistd.h文件中提供的方法,运行前需包含该库文件)
纠错:
部分博客将clock_tick与jiffies相混淆,通过cat /boot/config-`uname -r` | grep '^CONFIG_HZ='命令获取当前机器的HZ,来将这些值换算成秒,我测试了下结果和sysconf(_SC_CLK_TCK)结果不一样,在我的机器上前者得到的HZ值为1000而后者为100。
以下是CPU的各个字段的解释:
user (1) Time spent in user mode.
解释:CPU处于用户态的时间
纠错:有人把该字段解释成CPU处于用户态且不包含nice为负的线程时间,这是不对的,原版英文并没有该附加解释。
nice (2) Time spent in user mode with low priority (nice).
解释:CPU用户态处于低优先级的进程执行时间。
纠错:很多人把它解释成nice为负的进程执行时间。原版解释说得很清楚是低优先级,而nice值越小意味着其优先级越大,一般的进程nice均为0,可以通过命令:
renice -n <nice> -p <pid>
切换进程的nice值。本人已测试当存在运行线程nice值为正的时候,该字段值会上升,如果存在运行线程的nice值为负,该字段值不变。这更能说明部分人的解释是错误的。另外我还发现若将当前的nice值置为正后,该文件中的user字段不再大范围改变(不再成百的涨),说明此时低优先级的程序执行时间未记录到user字段中。这里我们将user和nice字段的值重新定义:
User:统计cpu执行高优先级用户程序的时间(nice<=0)
Nice: 统计cpu执行低优先级用户程序的时间(nice>0)
system (3) Time spent in system mode.
解释:CPU处于核心态的时间
idle (4) Time spent in the idle task. This value should be USER_HZ times the second entry in the /proc/uptime pseudo-file.
解释:这段话可直译为系统处于idle进程的时间,linux提供一种特殊的进程成为idle进程,其pid为0。我们可以认为处于idle进程的时间即为系统处于空闲的时间。接下来还说这个值应该等于USER_HZ乘以/proc/uptime中第二个字段的值。本人也通过这种方法测得cat /boot/config-`uname -r` | grep '^CONFIG_HZ='获取的HZ值不准确。
iowait (since Linux 2.5.41) (5) Time waiting for I/O to complete.
解释:CPU处于I/O等待的时间,没啥好说的
irq (since Linux 2.6.0-test4) (6) Time servicing interrupts.
解释:CPU在中断服务程序的时间开销,即用于处理中断的时间,没啥异议。
softirq (since Linux 2.6.0-test4) (7) Time servicing softirqs.
解释:CPU处理软件中断的时间,没啥异议
steal (since Linux 2.6.11) (8) Stolen time, which is the time spent in other operating systems when running in a virtualized environment
解释:窃取的时间,虚拟环境被其他操作系统窃取的CPU时间。
guest (since Linux 2.6.24) (9) Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
解释:在内核控制下虚拟出来的CPU访问客户操作系统的时间。
2、intr
This line shows counts of interrupts serviced since boot time, for each of the possible system interrupts. The first column is the total of all interrupts serviced including unnumbered architecture specific interrupts; each subsequent column is the total for that particular numbered interrupt. Unnumbered interrupts are not shown, only summed into the total.
解释:大致意思就是统计自系统启动以来的CPU服务的各种类型中断(硬中断)的次数,第一列代表所有中断的总次数,包括已被编号的中断和未被编号的中断,第二列开始记录各个已被编号的发生次数,而未被编号的中断未显示,只做了total。(PS:至于已被编号和未被编号类型有哪些和各自的含义,参见/proc/interrupt文件)。
3、ctxt
The number of context switches that the system underwent.
解释:即系统发生的上下文切换的总次数,无异议
4、btime
boot time, in seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC).
解释:即boot time记录系统启动的时间,单位是秒,相对于linux的epochtime即1970-01-01 00:00:00 +0000 (UTC).
纠错:部分帖子竟然将这个字段解释成从系统启动到当前的时间…..真误导人啊
5、processes
Number of forks since boot.
解释:即系统启动以来创建的进程数(ps.linux创建进程用fork()方法,常识)
6、procs_running
Number of processes in runnable state. (Linux 2.5.45 onward.)
解释:即处于运行态的进程数(应该包括正在运行的和就绪的进程)
7、procs_blocked
Number of processes blocked waiting for I/O to complete. (Linux 2.5.45 onward.)
解释:正处于I/O等待的进程数
8、Softirq
解释:估计是版本的原因,该字段没有解释,不过看字面很清楚,即不同类型软中断的次数,详见/proc/softirq,这个文件格式较为混乱,不过仔细看还是能看懂的
1.2、内存的实时性能数据 /proc/meminfo
实时内存性能值主要包含在/proc/meminfo文件中,下面我们依旧借助linux programmer manual解释其中各个字段的含义
This file reports statistics about memory usage on the system. It is used by free(1) to report the amount of free and used memory (both physical and swap) on the system as well as the shared memory and buffers used by the kernel. Each line of the file consists of a parameter name, followed by a colon, the value of the parameter, and an option unit of measurement (e.g., "kB"). The list below describes the parameter names and the format specifier required to read the field value. Except as noted below, all of the fields have been present since at least Linux 2.6.0. Some fields are displayed only if the kernel was configured with various options; those dependencies are noted in the list.
解释:这段文字是关于该文件内容的总体介绍,提到free函数用这个文件来获得当前的内存(包括物理内存(physical)和虚拟内存(swap))以及kernel使用的共享内存(shared memory)和缓存(buffer)的信息。还提到可能因版本问题或者内核编译的配置导致该文件的字段有差异等。
1、MemTotal %lu
Total usable RAM (i.e., physical RAM minus a few reserved bits and the kernel binary code).
解释:可用RAM总量,RAM即代表是物理内存而非虚拟内存,可用的物理内存是总的物理内存除去一些不可访问的部分包括一些保留的bits和内核的二进制代码等。这里的%lu是读取该数据所需要的数据类型为Unsigned Long,下同。(PS:值等于HighTotal+LowTotal)
2、MemFree %lu
The sum of LowFree+HighFree.
解释:按字面上理解应该是空闲的memory容量,值等于HighFree+LowFree
3、Buffers %lu
Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so).
4、Cached %lu
In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached.
解释:buffer/cache均是驻留在内存中的磁盘数据的缓存,不同的是Buffer存放的是raw disk block,针对的是块设备,例如直接用dd命令读写磁盘块,而cache缓存的是page,针对的是文件。一个是物理层面,一个是逻辑层面,他们之间的详细区别请参考这篇文章:http://soft.chinabyte.com/os/50/12301550.shtml。此外注意这里的cache不包含swapCached,见下一个字段
5、SwapCached %lu
Memory that once was swapped out, is swapped back in but still also is in the swap file. (If memory pressure is high, these pages don't need to be swapped out again because they are already in the swap file. This saves I/O.)
解释:大致意思是曾经被替换出去的内存被替换进来时,也会会存放到swap file(SwapCache)中。如果内存的访问量较高时,这些页(驻留在SwapCache中的页)无需再被替换出去,因为它们已经在swap file中,这样可以节省I/O。(PS:从字面上看,这个字段属于swap部分,即虚拟内存,和memory无关)
6、Active %lu
Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
Inactive %lu
Memory which has been less recently used. It is more eligible to be reclaimed for other purposes.
解释:Active/Inactive 指经常被访问到的和最不常被访问到的,经常访问的内存一般不会轻易收回,除非有绝对的必要,最不经常访问的反之,优先收回他用。
7、HighTotal %lu
(Starting with Linux 2.6.19, CONFIG_HIGHMEM is required.) Total amount of highmem. Highmem is all memory above ~860MB of physical memory. Highmem areas are for use by user-space programs, or for the page cache. The kernel must use tricks to access this memory, making it slower to access than lowmem.
8、HighFree %lu
(Starting with Linux 2.6.19, CONFIG_HIGHMEM is required.) Amount of free highmem.
解释:highTotal和highFree字段指的分别是Highmem的总容量和空闲空间。Highmem主要是内存地址高于860MB的物理内存,这一部分的内存组要用于用户控件的程序或是page cache。若kernel想访问highmem需要使用tricks(即事先减速)。
9、LowTotal %lu
(Starting with Linux 2.6.19, CONFIG_HIGHMEM is required.) Total amount of lowmem. Lowmem is memory which can be used for everything that highmem can be used for, but it is also available for the kernel's use for its own data structures. Among many other things, it is where everything from Slab is allocated. Bad things happen when you're out of lowmem.
10、LowFree %lu
Starting with Linux 2.6.19, CONFIG_HIGHMEM is required.) Amount of free lowmem.
解释:LowTotal和LowFree字段指的分别是Lowmem的总容量和空闲空间。Lowmem可以存放一切highmem能够存放的数据,此外它还可以存放kernel自己的数据结构。此外Slab的空间也在这部分内存中开辟。Slab是内核数据结构的cache
11、SwapTotal %lu
Total amount of swap space available.
12、SwapFree %lu
Amount of swap space that is currently unused.
解释:swap即虚拟内存,虚拟内存的总空间/剩余空间。
13、Dirty %lu
Memory which is waiting to get written back to the disk.
14、Writeback %lu
Memory which is actively being written back to the disk.
解释:dirty/writeback,前者是内存中在等待写回的数据(即所谓的脏数据),后者指正在写回的数据。
15、AnonPages %lu (since Linux 2.6.18)
Non-file backed pages mapped into user-space pagetables.
16、Mapped %lu
Files which have been mapped into memory (with mmap(2)), such as libraries.
17、Shmem %lu (since Linux 2.6.32)
[To be documented.]
18、Slab %lu
In-kernel data structures cache.
解释:kernel数据结构的cache,是在LowMEM中实现
19、SReclaimable %lu (since Linux 2.6.19)
Part of Slab, that might be reclaimed, such as caches.
20、SUnreclaim %lu (since Linux 2.6.19)
Part of Slab, that cannot be reclaimed on memory pressure.
解释:SReclaimable /SUnreclaim,首字母S代表SLAB,指slab内的可被回收和不可被回收的部分。
21、KernelStack %lu (since Linux 2.6.32)
Amount of memory allocated to kernel stacks.
解释:分配给内核栈内存量,内核栈的概念参考:http://19880512.blog.51cto.com/936364/274610。
22、PageTables %lu (since Linux 2.6.18)
Amount of memory dedicated to the lowest level of page tables.
解释:分配给最低级页表的内存数。
23、Quicklists %lu (since Linux 2.6.27)
(CONFIG_QUICKLIST is required.) [To be documented.]
24、NFS_Unstable %lu (since Linux 2.6.18)
NFS pages sent to the server, but not yet committed to stable storage.
解释:NFS(Network File System,网络文件系统)已发送给服务器但还未提交给稳定磁盘的页的总大小。
25、Bounce %lu (since Linux 2.6.18)
Memory used for block device "bounce buffers".
解释:用于块设备的bouncebuffer的内存量,bounce buffer 参考:http://blog.****.net/force_eagle/article/details/7723772。
26、WritebackTmp %lu (since Linux 2.6.26)
Memory used by FUSE for temporary writeback buffers.
解释:被FUSE(Filesystem in Userspace,用户空间文件系统)用作写回缓存的内存量,FUSE参考:http://www.cnblogs.com/codestub/archive/2011/08/18/2144190.html。
27、CommitLimit %lu (since Linux 2.6.10)
This is the total amount of memory currently available to be allocated on the system, expressed in kilobytes. This limit is adhered to only if strict overcommit accounting is enabled (mode 2 in /proc/sys/vm/overcommit_memory). The limit is calculated according to the formula described under /proc/sys/vm/overcommit_memory. For further details, see the kernel source file Documentation/vm/overcommit- accounting.
解释:该字段给出了当前可被申请的存储空间(不仅仅是物理内存,也包括虚拟内存),单位是KB,这个字段只有enable strict overcommit accounting值才有用(/proc/sys/vm/overcommit_memory中将值置为2),具体的算法可以见http://man7.org/linux/man-pages/man5/proc.5.html中对/proc/sys/vm/overcommit_memory,文件的解释。
28、Committed_AS %lu
The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. A process which allocates 1GB of memory (using malloc(3) or similar), but touches only 300MB of that memory will show up as using only 300MB of memory even if it has the address space allocated for the entire 1GB. This 1GB is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 in IR /proc/sys/vm/overcommit_memory ), allocations which would exceed the CommitLimit will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated.
解释:当前已被申请的内存空间(物理内存+虚拟内存),这个字段是包含所有已被进程申请的内存空间总和,即使部分空间还未被使用。这段话的后面还举个例子大致是说一个进程如果申请1G的空间但只占用300M,最后在该字段中也是1G,且这1G的空间的申请已经“提交”给虚拟内存,可以随时被进程使用。这个字段同样需要enable strict overcommit。(/proc/sys/vm/overcommit_memory中将值置为2)。申请的空间量不能超过CommitLimit的限制。
29、VmallocTotal %lu
Total size of vmalloc memory area.
30、VmallocUsed %lu
Amount of vmalloc area which is used.
31、VmallocChunk %lu
Largest contiguous block of vmalloc area which is free.
解释:以上三个字段包含vmalloc的区域信息。总容量,已使用,最大的连续的空闲块。
32、HardwareCorrupted %lu (since Linux 2.6.32)
(CONFIG_MEMORY_FAILURE is required.) [To be documented.]
AnonHugePages %lu (since Linux 2.6.38) (CONFIG_TRANSPARENT_HUGEPAGE is required.) Non-file backed huge pages mapped into user-space page tables.
33、HugePages_Total %lu
(CONFIG_HUGETLB_PAGE is required.) The size of the pool of huge pages.
34、HugePages_Free %lu
(CONFIG_HUGETLB_PAGE is required.) The number of huge pages in the pool that are not yet allocated.
35、HugePages_Rsvd %lu (since Linux 2.6.17)
(CONFIG_HUGETLB_PAGE is required.) This is the number of huge pages for which a commitment to allocate from the pool has been made, but no allocation has yet been made. These reserved huge pages guarantee that an application will be able to allocate a huge page from the pool of huge pages at fault time.
36、HugePages_Surp %lu (since Linux 2.6.24)
(CONFIG_HUGETLB_PAGE is required.) This is the number of huge pages in the pool above the value in /proc/sys/vm/nr_hugepages. The maximum number of surplus huge pages is controlled by /proc/sys/vm/nr_overcommit_hugepages.
37、Hugepagesize %lu
(CONFIG_HUGETLB_PAGE is required.) The size of huge pages.
解释:包含的hugepage(大页)的配置信息,包含总数量,未被分配的大页的数量,保留大页数,剩余大页数,大页的大小。
1.3、磁盘实时性能数据 /proc/diskstatLinux programmer manual中提示去翻阅linux内核文档document/iostat.txt:https://www.kernel.org/doc/Documentation/iostats.txt。下面通过该文档的主要信息解释/proc/diskstat下各字段的含义。
该文件关于磁盘信息包含14个列,前三列分别指主设备号,次设备号以及设备名称,剩余11个阈的解释如下
Field 1 -- # of reads completed
This is the total number of reads completed successfully.
第1个域:读完成次数 ----- 读磁盘的次数,成功完成读的总次数。
Field 2 -- # of reads merged, field 6 -- # of writes merged
Reads and writes which are adjacent to each other may be merged for efficiency. Thus two 4K reads may become one 8K read before it is ultimately handed to the disk, and so it will be counted (and queued)
第2个域:合并读完成次数, 第6个域:合并写完成次数。为了效率可能会合并相邻的读和写。例如两次4K的读在它最终被处理到磁盘上之前可能会变成一次8K的读,在计数或排队上只算一次I/O操作。
Field 3 -- # of sectors read
This is the total number of sectors read successfully.
第3个域:读扇区的次数,成功读过的扇区总次数。
Field 4 -- # of milliseconds spent reading
This is the total number of milliseconds spent by all reads (as measured from __make_request() to end_that_request_last()).
第4个域:读花费的毫秒数,这是所有读操作所花费的毫秒数。
Field 5 -- # of writes completed
This is the total number of writes completed successfully.
第5个域:写完成次数 ----写完成的次数,成功写完成的总次数。
Field 6 -- # of writes merged
See the description of field 2.
第6个域:合并写完成次数 -----合并写次数。
Field 7 -- # of sectors written
This is the total number of sectors written successfully.
第7个域:写扇区次数 ---- 写扇区的次数,成功写扇区总次数。
Field 8 -- # of milliseconds spent writing
This is the total number of milliseconds spent by all writes (as measured from __make_request() to end_that_request_last()).
第8个域:写I/O操作花费的毫秒数 --- 写花费的毫秒数,这是所有写操作所花费的毫秒数。
Field 9 -- # of I/Os currently in progress
The only field that should go to zero. Incremented as requests are given to appropriate struct request_queue and decremented as they finish.
第9个域:当前正在处理的磁盘I/O请求数。发出请求则增加,请求完成则减少
Field 10 -- # of milliseconds spent doing I/Os
This field increases so long as field 9 is nonzero.
第10个域:I/O花费的总的毫秒数,即当第9个阈不为0(当前有未完成的I/O请求),则该阈会上升。
Field 11 -- weighted # of milliseconds spent doing I/Os This field is incremented at each I/O start, I/O completion, I/O merge, or read of these stats by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O since the last update of this field. This can provide an easy measure of both I/O completion time and the backlog that may be accumulating.
第11个域:I/O操作的加权值,当系统存在I/O启动,I/O完成,I/O合并,或者(read of these stats by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O since the last update of this field.)时它会增加。
1.4、网络实时性能数据 /proc/net/dev
该文件内容很直观,分两个区域,左边是收包信息,右边是发包信息,接下来每个网卡分两行列举信息。lo指本地loop回环包的情形,eth0是网络接口名(网卡),本人已测试,当本地运行ping 127.0.0.1时lo行会剧增,当运行ping baidu.com时eth0处包会增加。
1.5、进程性能数据 /proc/<pid>/stat /proc/<pid>/io
现有很多工具提供对进程性能的实时监控,它们的数据的来源主要是linux根目录下的/proc/<pid>/stat文件。现根据manual中提供的信息解析该文件中各个值的含义。(1) pid %d
The process ID.
(2) comm %s
The filename of the executable, in parentheses. This is visible whether or not the executable is swapped out.
(3) state %c
One of the following characters, indicating process state:
R Running
S Sleeping in an interruptible wait
D Waiting in uninterruptible disk sleep
Z Zombie
T Stopped (on a signal) or (before Linux 2.6.33) trace stopped
t Tracing stop (Linux 2.6.33 onward)
W Paging (only before Linux 2.6.0)
X Dead (from Linux 2.6.0 onward)
x Dead (Linux 2.6.33 to 3.13 only)
K Wakekill (Linux 2.6.33 to 3.13 only)
W Waking (Linux 2.6.33 to 3.13 only)
P Parked (Linux 3.9 to 3.13 only)
解释:前三列分别是进程号,执行命令,进程状态,不多说。
(4) ppid %d
The PID of the parent of this process.
解释:父进程进程号
(5) pgrp %d
The process group ID of the process.
解释:进程的进程组号
(6) session %d
The session ID of the process.
解释:进程所在会话组ID
(7) tty_nr %d
The controlling terminal of the process. (The minor device number is contained in the combination of bits 31 to 20 and 7 to 0; the major device number is in bits 15 to 8.)
解释:该任务的tty终端的设备号,共32位,主设备号为第15bit到第8bit,次设备号为31-20bit和7-0bit的组合。(好奇怪,设计的人脑子有病)
(8) tpgid %d
The ID of the foreground process group of the controlling terminal of the process.
解释:用于控制该终端进程的前端进程组的ID。
(9) flags %u
The kernel flags word of the process. For bit meanings, see the PF_* defines in the Linux kernel source file include/linux/sched.h. Details depend on the kernel version. The format for this field was %lu before Linux 2.6.
解释:进程标志位,意义让你去查kernel中sched.h中的PF_*变量
(10) minflt %lu
The number of minor faults the process has made which have not required loading a memory page from disk.
解释:次缺中断的数目,次缺页中断指中断所需的页在页cache中命中无需启动磁盘触发I/O
(11) cminflt %lu
The number of minor faults that the process's waited-for children have made.
解释:从字面上解释是线程的子线程触发的次缺页次数。这里的waited-for children的含义我查阅了很多资料都没有找到权威的解释,个人理解是为了提升效率,父进程需要的数据可能创建子线程去读取,在子线程返回数据前,父进程会始终处于等待状态。(注意英文中修饰children的分词是过去分词,表被动,说明children是被等待,被父进程等待。个人理解,英语不好,勿喷)
(12) majflt %lu
The number of major faults the process has made which have required loading a memory page from disk.
解释:主缺页数,需要触发磁盘I/O
(13) cmajflt %lu
The number of major faults that the process's waited-for children have made.
解释:等待子线程的主缺页数,解释同上。
(14) utime %lu
Amount of time that this process has been scheduled in user mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK)). This includes guest time, guest_time (time spent running a virtual CPU, see below), so that applications that are not aware of the guest time field do not lose that time from their calculations.
解释:该线程处于用户态的时间单位是clock_tick,通过sysconf(_SC_CLK_TCK)函数获得(上文已解释过这个单位的含义)。
(15) stime %lu
Amount of time that this process has been scheduled in kernel mode, measured in clock ticks (divide bysysconf(_SC_CLK_TCK)).
解释:该线程处于核心态的时间,单位是clock_tick.
(16) cutime %ld
Amount of time that this process's waited-for children have been scheduled in user mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK)). (See also times(2).) This includes guest time, cguest_time (time spent running a virtual CPU, seebelow).
解释:该进程的等待子线程处于用户态的的时间
(17) cstime %ld
Amount of time that this process's waited-for children have been scheduled in kernel mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK)).
解释:该进程的等待子线程处于核心态的的时间
(18) priority %ld
(Explanation for Linux 2.6) For processes running a real-time scheduling policy (policy below; see sched_setscheduler(2)), this is the negated scheduling priority, minus one; that is, a number in the range -2 to -100, corresponding to real-time priorities 1 to 99. For processes running under a non-real-time scheduling policy, this is the raw nice value (setpriority(2)) as represented in the kernel. The kernel stores nice values as numbers in the range 0 (high) to 39 (low), corresponding to the user-visible nice range of -20 to 19. Before Linux 2.6, this was a scaled value based on the scheduler weighting given to this process.
解释:进程的优先级,臭长的文字大致表达的意思是在实时调度策略的系统中,这个值是无效值-1,实时系统中用-2~-100表示1~99的优先级。如果在非实时系统中这是一个最原始的nice值。内核所使用的nice值范围是0(高)~39(低)对应着用户可以设定的范围是-20(高)~19(低)。
(19) nice %ld
The nice value (see setpriority(2)), a value in the range 19 (low priority) to -20 (high priority).
解释:进程的nice值,不解释
(20) num_threads %ld
Number of threads in this process (since Linux 2.6). Before kernel 2.6, this field was hard coded to 0 as a placeholder for an earlier removed field.
解释:进程创建的线程数。2.6之前一直是0,无效值。
(21) itrealvalue %ld
The time in jiffies before the next SIGALRM is sent to the process due to an interval timer. Since kernel 2.6.17, this field is no longer maintained, and is hard coded as 0.
解释:由于计时器的间隔导致下一个SIGALRM 发送给进程的时延,2.7.17后这个字段失效,所以不用深究
(22) starttime %llu
The time the process started after system boot. In kernels before Linux 2.6, this value was expressed in jiffies. Since Linux 2.6, the value is expressed in clock ticks (divide by sysconf(_SC_CLK_TCK)). The format for this field was %lu before Linux 2.6.
进程启动时间,单位是jiffies,但从2.6版本后单位就变成clock tick.(PS:从这段解释我们可以看出,jiffies和clock ticks是两个概念,本人也经过测试,本人主机上1jiffies=1/1000秒而1 clock_tick=1/100秒。很多博客把两个概念混淆!!!!!)
(23) vsize %lu
Virtual memory size in bytes.
解释:线程申请的虚拟内存空间字节数,无异议
(24) rss %ld
Resident Set Size: number of pages the process has in real memory. This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out.
解释:线程在实际内存中的驻留集,实际在内存中页的数目,概念很清晰。
(25) rsslim %lu
Current soft limit in bytes on the rss of the process; see the description of RLIMIT_RSS in getrlimit(2).
解释:当前线程能驻留内存的软限制,可以理解为该进程能驻留物理地址空间的最大值,可已超出物理内存大小。
(26) startcode %lu
The address above which program text can run.
解释:指示虚拟空间的可执行代码段的起始地址
(27) endcode %lu
The address below which program text can run.
解释:指示虚拟地址空间可执行代码段的结束地址
(28) startstack %lu
The address of the start (i.e., bottom) of the stack.
解释:进程栈的起始地址
(29) kstkesp %lu
The current value of ESP (stack pointer), as found in the kernel stack page for the process.
解释:当前访问的栈指针ESP的值
(30) kstkeip %lu
The current EIP (instruction pointer).
解释:当前指令指针(EIP)的值
(31) signal %lu
The bitmap of pending signals, displayed as a decimal number. Obsolete, because it does not provide information on real-time signals; use /proc/[pid]/status instead.
解释:被挂起信号的二进制位图,已被废弃,原因是不能提供实时信号。当前的信号值见/proc/[pid]/status
(32) blocked %lu
The bitmap of blocked signals, displayed as a decimal number. Obsolete, because it does not provide information on real-time signals; use /proc/[pid]/status instead.
解释:被阻塞信号的位图,被废
(33) sigignore %lu
The bitmap of ignored signals, displayed as a decimal number. Obsolete, because it does not provide information on real-time signals; use /proc/[pid]/status instead.
解释:被忽视的信号位图,被废
(34) sigcatch %lu
The bitmap of caught signals, displayed as a decimal number. Obsolete, because it does not provide information on real-time signals; use /proc/[pid]/status instead.
解释:被获取的信号位图,被废
(35) wchan %lu
This is the "channel" in which the process is waiting. It is the address of a location in the kernel where the process is sleeping. The corresponding symbolic name can be found in /proc/[pid]/wchan.
解释:进程正在等待的“channel”,当进程正在睡眠时,该字段给出了进程的调用点(个人理解是进程被唤起后执行代码的地址)
(36) nswap %lu
Number of pages swapped (not maintained).
解释:交换得到页数,一直无效
(37) cnswap %lu
Cumulative nswap for child processes (not maintained).
解释:所有子进程被swapped的页数的和,无效
(38) exit_signal %d (since Linux 2.1.22)
Signal to be sent to parent when we die.
解释:当本进程“死去”时发给父进程的信号
(39) processor %d (since Linux 2.2.8)
CPU number last executed on.
解释:上次运行在哪个CPU上
(40) rt_priority %u (since Linux 2.5.19)
Real-time scheduling priority, a number in the range 1 to 99 for processes scheduled under a real-time policy, or 0, for non-real-time processes (see sched_setscheduler(2)).
解释:实时调度时的优先级,范围是1~99,当处于非实时策略时该值为0
(41) policy %u (since Linux 2.5.19)
Scheduling policy (see sched_setscheduler(2)). Decode using the SCHED_* constants in linux/sched.h. The format for this field was %lu before Linux 2.6.22.
解释:进程的调度策略,0=非实时进程,1=FIFO实时进程;2=RR实时进程
(42) delayacct_blkio_ticks %llu (since Linux 2.6.18)
Aggregated block I/O delays, measured in clock ticks (centiseconds).
解释:聚集块的I/O延迟(可以理解为disk merge请求的I/O块),单位是厘秒(1/100秒)
(43) guest_time %lu (since Linux 2.6.24)
Guest time of the process (time spent running a virtual CPU for a guest operating system), measured in clock ticks (divide by sysconf(_SC_CLK_TCK)).
解释:进程用于用户操作系统的时间,单位是clock tick。(和/proc/stat中cpu部分的倒数第二个字段对应)
(44) cguest_time %ld (since Linux 2.6.24)
Guest time of the process's children, measured in clock ticks (divide by sysconf(_SC_CLK_TCK)).
解释:进程的子线程用于用户操作系统的时间
(45) start_data %lu (since Linux 3.3)
Address above which program initialized and uninitialized (BSS) data are placed.
解释:进程初始化和非初始化的数据的存放起始地址
(46) end_data %lu (since Linux 3.3)
Address below which program initialized and uninitialized (BSS) data are placed.
解释:进程初始化和非初始化的数据的存放结束地址
(47) start_brk %lu (since Linux 3.3)
Address above which program heap can be expanded with brk(2).
解释:可以通过brk()函数扩展heap空间的起始地址,brk科普可以百度一下
(48) arg_start %lu (since Linux 3.5)
Address above which program command-line arguments (argv) are placed.
解释:命令行参数表argv存放的起始地址
(49) arg_end %lu (since Linux 3.5)
Address below program command-line arguments (argv) are placed.
解释:命令行参数表argv存放的终止地址
(50) env_start %lu (since Linux 3.5)
Address above which program environment is placed.
解释:存放进程执行环境变量的起始地址
(51) env_end %lu (since Linux 3.5)
Address below which program environment is placed.
解释:存放进程执行环境变量的终止地址
(52) exit_code %d (since Linux 3.5)
The thread's exit status in the form reported by waitpid(2).
解释:进程的退出码。
以上的进程数据还缺少线程读写磁盘的数据,该数据在/proc/<pid>/io中统计:下面解释各字段的含义:
1、rchar: characters read
The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read(2) and similar system calls. It includes things such as terminal I/O and is unaffected by whether or not actual physical disk I/O was required (the read might have been satisfied from pagecache).
wchar: characters written
The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with rchar.
解释:rchar和wchar字面上解释线程I/O读/写的字节数。但是注意英文解释的补充:这个字段统计的是线程中从read()/write()函数返回/写入的所有字节数,也包括那些从page cache中满足的字节(这些无需启动I/O)因此这个字段严格上讲是不准确的。
syscr: read syscalls
Attempt to count the number of read I/O operations—that is, system calls such as read(2) and pread(2).
syscw: write syscalls
Attempt to count the number of write I/O operations—that is, system calls such as write(2) and pwrite(2).
解释:syscr和syscw字段是统计线程中系统调用read()/write()函数的次数,调用这些函数起始并不意味着启动一次I/O,故该字段严格意义上也是不准确
read_bytes: bytes read
Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. This is accurate for block-backed filesystems.
write_bytes: bytes written
Attempt to count the number of bytes which this process caused to be sent to the storage layer.
解释:这两个字段是直接统计线程实际读写磁盘的比特数,是比较精确地数值!!!!
cancelled_write_bytes:
The big inaccuracy here is truncate. If a process writes 1MB to a file and then deletes the file, it will in fact perform no writeout. But it will have been accounted as having caused 1MB of write. In other words: this field represents the number of bytes which this process caused to not happen, by truncating pagecache. A task can cause "negative" I/O too. If this task truncates some dirty pagecache, some I/O which another task has been accounted for (in its write_bytes) will not be happening.
解释:这个字段主要统计那些线程已经写了但没有写成功的字节数,主要原因在于页截断。举个例子:当一个进程写1MB数据到文件内然后把这个文件删除了,那么这1MB其实是被写取消的,但是该文件中仍旧会记录1MB的写操作。1.6、时间参数的获取 /proc/uptime
Linux内核提供获取当前时间的文件/proc/uptime,文件内仅包含两个值,第一个是自开机以来到当前的累积时间,以秒为单位,第二个值指开机以来到当前CPU为空闲的时间,单位也是秒。若需要获得当前的绝对时间,可以结合/proc/stat中的btime字段,该字段是相对于Linux的Epoch的开机时间。
2、Linux各性能指标的计算方法
2.1 CPU性能
我们日常所关心的CPU性能主要是CPU的利用率,即处于繁忙阶段的时间片占总cpu时间片的比值。基本的计算思想:在尽可能小的时间间隔内取两个cpu的时间状态,包括处于繁忙的时间片和处于非空闲的时间片,作商即可。
方法1:借助/proc/stat文件。
即取两个尽可能短的时间间隔内的cpu统计数据分别为:
Total1=User1+nice1+system1+idle1+iowait1+irq1+softirq1+stealstolen1+guest1
Total2=User2+nice2+system2+idle2+iowait2+irq2+softirq2+stealstolen2+guest2
则CPU-usage=((total2-idle2)-(total1-idle1))/(total2-total1)
方法2:较为简单,这直接用/proc/uptime
尽管uptime仅有两个数值,但是表示总时间和CPU为idle的时间,已足以计算CPU的利用率,即delta(第一个数值-第二个数值)/delta(第一个数值)2.2 内存性能
内存性能主要是内存的使用量。在计算内存的使用量时我们需要的/proc/meminfo的数据主要有: MemTotal ,MemFree,Buffers,Cached。
我们之前讲过MemFree是当前空闲的内存量。Buffer和Cache是已被操作系统分配用于缓存块数据和文件数据的缓存。
因此从操作系统层面上讲buffer和cache是已被占用的(因为它已经被分配),其被占用的内存量为:
MemOcupyos=MemTotal-MemFree --------------------------//操作系统层面
而从用户层面上讲buffer和cache只是被赋予特殊功能的保留空间,可以认为是未被占用。已被占用的内存量:
MemOcupyusr=MemTotal-MemFree –Buffers-Cached----//用户层面
因此当我们需要获取应用程序的内存使用量我们应该取后者。从Linux的Free命令我们也可以看出一些端倪:第一行的每个值对应的是meminfo中的各个字段的值,即以操作系统的眼光给出了memory的空闲和使用值。
第二行的-/+ buffers/cache则是从用户眼光对used和free值进行修正,即used-(buffers+cached),free+(buffers+cached)
综上所述在基于应用程序的性能监控中,内存的使用量应该是:
Mem-usage=( MemTotal-MemFree –Buffers-Cached)/ MemTotal2.3 磁盘性能
根据/proc/diskstat提供的数据,我们可以很方便地算出如下指标:- 每秒完成的读 I/O 设备次数,即:delta(rio)/s(第一个域)
- 每秒完成的写 I/O 设备次数,即:delta(wio)/s(第五个域)
- 每秒读扇区数,即:delta(rsect)/s (第三个域)
- 每秒写扇区数,即: delta(wsect)/s(第七个域)
- 每秒读K字节数。是 rsect/s 的一半,因为每扇区大小为512字节 即:delta(rsect)/s/2 (第三个域)
- 每秒写K字节数。是 wsect/s 的一半 即:delta(wsect)/s/2(第七个域)
2.4 网络性能
根据/proc/net/dev文件提供的信息可以很方便地获得每个网络接口端的实时的网络流量:
- 接收包速率rpkts:delta(rpkt)/t
- 发送包速率spkts:delta(spkt)/t
- 接受字节率rbyte:delta(rbyte)/t
- 发送字节率 sbyte:delta(sbyte)/t
- 误码率 ….
2.5 进程性能
进程性能也主要包括线程的CPU占用,内存占用,和读写磁盘性能。
2.5.1进程的CPU占用
进程的CPU占用的计算方法与全局的CPU占用的计算方法类似,首先取进程占用的总时间utime+stime+cutime+cstime,分别对应/proc/<pid>/stat下的第14,15,16,17列数据:则进程对CPU占用率Process-Utility可以按照如下公式计算
Process-Utility=delta(utime+stime+cutime+cstime)/(delta(uptime)*clock_hz)
总时间取/proc/uptime中的第一个数据,乘以clock_h换算成clock_tick(注意这里不是jiffies)
2.5.2进程的内存占用
进程的内存的占用指标一般包括线程已申请的虚拟地址空间的大小,VSZ以及线程实际驻留物理地址空间的大小RSS。这两个指标可以从/proc/<pid>/中直接读取,单位是字节:
见图:
2.5.3 进程对磁盘的读写速率
进程对磁盘I/O统计数据在/proc/<pid>/io文件中,前文已解释过,该文件中的前四行数据均不准确。第5,6行数据才是真正统计磁盘I/O的数据,故进程磁盘读写速率的计算可以根据这两行数据完成。线程读写磁盘速率:
Readbyte=delta(read_byte)/delta(uptime)
Writebyte= delta(write_byte)/delta(uptime)