IBM系统监控工具nmon命令详解(2)!

时间:2022-01-29 23:16:26

首先看看nmon命令的帮助信息:

[root@linux nmon]# ./nmon.sh -h

Hint: nmon.sh [-h] [-s <seconds>] [-c <count>] [-f -d <disks> -t -r <name>] [-x]

-h 查看完整的说明信息,有两种模式:a、命令行交互式模式 (h) b、对于数据收集模式 (-f)
-f 电子表格的输出格式 [注意:默认 -s300 -c288] 可选 (300秒*288次=86400秒=60*60*24=1天)
-s <seconds> 刷新屏幕频率的时间 [默认 2]
-c <number> 刷新屏幕的次数 [默认 1000000]
-d <disks> to increase the number of disks [default 256]
-t spreadsheet includes top processes
-x capacity planning (每15分钟1天 = -fdt -s 900 -c 96)

版本 - nmon 14g

对于命令行交互式模式
-s <seconds> 刷新屏幕频率的时间 [默认 2]
-c <number> 刷新屏幕的次数 [默认 1000000]
-g <filename> User Defined Disk Groups [hit g to show them]
- file = on each line: group_name <disks list> space separated
- like: database sdb sdc sdd sde
- upto 64 disk groups, 512 disks per line
- disks can appear more than once and in many groups
-b 命令行交互模式的界面是黑色和白色 [默认的颜色]
例如: nmon.sh -s 1 -c 100 (说明:在命令行交互模式下,每秒钟刷新一次屏幕,总共采集100次)

对于数据收集模式 = 电子表格格式 (逗号分隔值)
Note: use only one of f,F,z,x or X and make it the first argument
-f 电子表格输出格式 [注意: default -s300 -c288]
输出文件是 <hostname>_YYYYMMDD_HHMM.nmon
-F <filename> 等同于 -f 但是使用用户提供的文件名
-r <runname> 用于电子表格文件 [default hostname]
-t include top processes in the output
-T as -t plus saves command line arguments in UARG section
-s <seconds> 采集数据的时间
-c <number> 采集数据的次数
-d <disks> to increase the number of disks [default 256]
-l <dpl> disks/line default 150 to avoid spreadsheet issues. EMC=64.
-g <filename> User Defined Disk Groups (see above) - see BBBG & DG lines
-N include NFS Network File System
-I <percent> Include process & disks busy threshold (default 0.1)
don't save or show proc/disk using less than this percent
-m <directory> 生成的数据文件的路径
例如:在30秒的时间间隔收集的top procs,持续1小时
nmon.sh -f -t -r Test1 -s30 -c120

To load into a spreadsheet:
sort -A *nmon >stats.csv
transfer the stats.csv file to your PC
Start spreadsheet & then Open type=comma-separated-value ASCII file
The nmon analyser or consolidator does not need the file sorted.

Capacity planning mode - use cron to run each day
-x sensible spreadsheet output for CP = one day
每15分钟1天 ( i.e. -ft -s 900 -c 96)
-X sensible spreadsheet output for CP = busy hour
每30秒1小时 ( i.e. -ft -s 30 -c 120)

交互模式命令
key --- Toggles to control what is displayed ---
h = 联机帮助信息
r = 机器类型,机器名,缓存信息和OS版本+LPAR
c = CPU处理器统计条形图
l = 条形图长期CPU(超过75个快照)
m = 内存统计
L = 巨大的内存页面统计
V = 虚拟内存和交换统计
k = 内核内部统计
n = 网络统计和错误
N = NFS网络文件系统
d = 磁盘I/O图
D = 磁盘I/O统计
o = 磁盘I/O映射(每个磁盘上的一个字符显示它是多么繁忙)
j = 文件系统
t = *进程统计使用1,3,4,5来选择数据及顺序
u = *进程命令的详细信息
v = 详细简单的检查 - OK/Warn(警告)/Danger(危险)
b = 黑白模式(或使用- b选项)
. = 最小模式,即只显示繁忙的磁盘和进程

key --- Other Controls ---
+ = 双屏幕刷新时间
- = 一半的屏幕刷新时间
q = 退出 (also x, e or control-C)
0 = 零峰计数复位 (峰值 = ">")
space = 立即刷新屏幕

Startup Control
If you find you always type the same toggles every time you start
then place them in the NMON shell variable. For example:
export NMON=cmdrvtan

Others:
a) To you want to stop nmon - kill -USR2 <nmon-pid>
b) Use -p and nmon outputs the background process pid
c) To limit the processes nmon lists (online and to a file)
Either set NMONCMD0 to NMONCMD63 to the program names
or use -C cmd:cmd:cmd etc. example: -C ksh:vi:syncd
d) If you want to pipe nmon output to other commands use a FIFO:
mkfifo /tmp/mypipe
nmon -F /tmp/mypipe &
grep /tmp/mypipe
e) If nmon fails please report it with:
1) nmon version like: 14g
2) the output of cat /proc/cpuinfo
3) some clue of what you were doing
4) I may ask you to run the debug version

Developer Nigel Griffiths
Feedback welcome - on the current release only and state exactly the problem
No warranty given or implied.

在操作系统下面输入如下命令就会进到nmon的监控界面:

[root@linux nmon]# ./nmon.sh
+nmon-14g------[H for help]---Hostname=linux--------Refresh= 2secs ---04:22.50-----------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                                                              |
|  ------------------------------       For help type H or ...                                                                                                                                 |
|  #    #  #    #   ####   #    #        nmon -?  - hint                                                                                                                                       |
|  ##   #  ##  ##  #    #  ##   #        nmon -h  - full                                                                                                                                       |
|  # #  #  # ## #  #    #  # #  #                                                                                                                                                              |
|  #  # #  #    #  #    #  #  # #       To start the same way every time                                                                                                                       |
|  #   ##  #    #  #    #  #   ##        set the NMON ksh variable                                                                                                                             |
|  #    #  #    #   ####   #    #                                                                                                                                                              |
|  ------------------------------                                                                                                                                                              |
|                                                                                                                                                                                              |
|  Use these keys to toggle statistics on/off:                                                                                                                                                 |
|     c = CPU        l = CPU Long-term   - = Faster screen updates                                                                                                                             |
|     m = Memory     j = Filesystems     + = Slower screen updates                                                                                                                             |
|     d = Disks      n = Network         V = Virtual Memory                                                                                                                                    |
|     r = Resource   N = NFS             v = Verbose hints                                                                                                                                     |
|     k = kernel     t = Top-processes   . = only busy disks/procs                                                                                                                             |
|     h = more options                   q = Quit                                                                                                                                              |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

在这里就可以看见一些指令的介绍和一些信息,其中 Refresh= 2secs 就是表示监控界面2秒钟刷新一次,也可以在nmon命令后面跟 -s 参数来指定想要刷新的频率,输入 h 可以看见更详细指令的介绍:

+nmon-14g------[H for help]---Hostname=linux--------Refresh= 2secs ---04:27.49-----------------------------------------------------------------------------------------------------------------+
| HELP ------------------------------------------------------------------------------------------------------------------------------------------------ |
| key --- statistics which toggle on/off --- |
| h = This help information |
| r = RS6000/pSeries CPU/cache/OS/kernel/hostname details + LPAR |
| t = Top Process Stats 1=basic 3=CPU |
| u = shows command arguments (hit twice to refresh) |
| c = CPU by processor l = longer term CPU averages |
| m = Memory & Swap stats L=Huge j = JFS Usage Stats |
| n = Network stats N = NFS |
| d = Disk I/O Graphs D=Stats o = Disks %Busy Map |
| k = Kernel stats & loadavg V = Virtual Memory |
| g = User Defined Disk Groups [start nmon with -g <filename>] |
| v = Verbose Simple Checks - OK/Warnings/Danger |
| b = black & white mode |
| --- controls --- |
| + and - = double or half the screen refresh time |
| q = quit space = refresh screen now |
| . = Minimum Mode =display only busy disks and processes |
| 0 = reset peak counts to zero (peak = ">") |
| Developer Nigel Griffiths see http://nmon.sourceforge.net |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
在这里可以看见所有输入的指令,以及会看到什么内容,这里要说一下在nmon中输入一次 h 会看见帮助信息,在敲一次 h 就会取消显示了,其它指令也同理,这里输入 r (机器类型,机器名,缓存信息和OS版本+LPAR):

| Linux and Processor Details -------------------------------------------------------------------------------------------------------------------------                                        |
| Linux: Linux version 2.6.18-164.el5 (mockbuild@x86-002.build.bos.redhat.com) |
| Build: (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) |
| Release : 2.6.18-164.el5 |
| Version : #1 SMP Tue Aug 18 15:51:54 EDT 2009 |
| cpuinfo: model name : Intel(R) Core(TM) i3-2310M CPU @ 2.10GHz |
| cpuinfo: vendor_id : GenuineIntel |
| cpuinfo: cpu MHz : 2093.260 |
| cpuinfo: bogomips : 4186.52 |
| # of CPUs: 1 --1颗cpu |
| Machine : i686 |
| Nodename : linux --hostname |
| /etc/*ease[1]: Red Hat Enterprise Linux Server release 5.4 (Tikanga) --操作系统版本 |
| /etc/*ease[2]: (null) |
| /etc/*ease[3]: (null) |
| /etc/*ease[4]: (null) |
| lsb_release: Distributor ID: RedHatEnterpriseServer |
| lsb_release: Description: Red Hat Enterprise Linux Server release 5.4 (Tikanga) |
| lsb_release: Release: 5.4 |
| lsb_release: Codename: Tikanga |
+---------Warning: Some Statistics may not shown-----------------------------------------------------------------------------------------------------------------------------------------------+
在这里看见一些主机和操作系统的信息,再敲一次 r 就会取消显示了,然后输入 t (*进程统计使用1,3,4,5来选择数据及顺序),然后再按数字 5 :

| Top Processes Procs=85 mode=5 (1=Basic, 3=Perf 4=Size 5=I/O)---------------------------------------------------------------------------------------------------------------------------------|
| PID %CPU Size Res Res Res Res Shared Faults Command |
| Used KB Set Text Data Lib KB Min Maj |
| 4050 0.5 12748 10548 108 10896 0 832 84 0 nmon.sh |
| 1 0.0 2072 624 32 280 0 532 0 0 init |
| 2 0.0 0 0 0 0 0 0 0 0 migration/0 |
| 3 0.0 0 0 0 0 0 0 0 0 ksoftirqd/0 |
| 4 0.0 0 0 0 0 0 0 0 0 watchdog/0 |
| 5 0.0 0 0 0 0 0 0 0 0 events/0 |
| 6 0.0 0 0 0 0 0 0 0 0 khelper |
| 7 0.0 0 0 0 0 0 0 0 0 kthread |
| 10 0.0 0 0 0 0 0 0 0 0 kblockd/0 |
| 11 0.0 0 0 0 0 0 0 0 0 kacpid |
| 67 0.0 0 0 0 0 0 0 0 0 cqueue/0 |
| 70 0.0 0 0 0 0 0 0 0 0 khubd |
| 72 0.0 0 0 0 0 0 0 0 0 kseriod |
| 136 0.0 0 0 0 0 0 0 0 0 pdflush |
| 137 0.0 0 0 0 0 0 0 0 0 pdflush |
| 138 0.0 0 0 0 0 0 0 0 0 kswapd0 |
| 139 0.0 0 0 0 0 0 0 0 0 aio/0 |
+---------Warning: Some Statistics may not shown-----------------------------------------------------------------------------------------------------------------------------------------------+
注意这个 mode=5 表示就是按I/O来排序了,这里还可以选择其它(1、3、4、5)方式排序,这里可以看见系统有85个进程(Procs=85),5就是按占用的cpu来排的降序,接着输入 u (*进程命令的详细信息):

| Top Processes Procs=85 mode=5 (1=Basic, 3=Perf 4=Size 5=I/O)---------------------------------------------------------------------------------------------------------------------------------|
| PID %CPU ResSize Command Command |
| Used KB |
| 4050 1.0 10660 ./nmon.sh |
| 1 0.0 624 init [3] |
| 2 0.0 0 [migration/0] |
| 3 0.0 0 [ksoftirqd/0] |
| 4 0.0 0 [watchdog/0] |
| 5 0.0 0 [events/0] |
| 6 0.0 0 [khelper] |
| 7 0.0 0 [kthread] |
| 10 0.0 0 [kblockd/0] |
| 11 0.0 0 [kacpid] |
| 67 0.0 0 [cqueue/0] |
| 70 0.0 0 [khubd] |
| 72 0.0 0 [kseriod] |
| 136 0.0 0 [pdflush] |
| 137 0.0 0 [pdflush] |
| 138 0.0 0 [kswapd0] |
| 139 0.0 0 [aio/0] |
+---------Warning: Some Statistics may not shown-----------------------------------------------------------------------------------------------------------------------------------------------+
以上信息都很直观,我就不在多说了,接着看 c  (CPU处理器统计条形图):

| CPU Utilisation -------------------------------------------------------------------------------------------------------------------------------------                                        |
|---------------------------+-------------------------------------------------+ |
|CPU User% Sys% Wait% Idle|0 |25 |50 |75 100| |
| 1 0.0 0.0 0.0 100.0| > | |
|---------------------------+-------------------------------------------------+ |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
从上面可以看见系统非常闲(Idle=100%),其中“>”代表的是系统最高cpu的使用峰值,如果按数字0就会重置峰值为0了,接着看 l   (条形图长期CPU):
| CPU +-------------------------------------------------------------------------+                                                                                                              ||100%-|          |                                                                                                                                                                             || 95%-|          |                                                                                                                                                                             || 90%-|          |                                                                                                                                                                             || 85%-|          |                                                                                                                                                                             || 80%-|          |                                                                                                                                                                             || 75%-|          |                                                                                                                                                                             || 70%-|          |                                                                                                                                                                             || 65%-|          |                                                                                                                                                                             || 60%-|          |                                                                                                                                                                             || 55%-|          |                                                                                                                                                                             || 50%-|          |                                                                                                                                                                             || 45%-|          |                                                                                                                                                                             || 40%-|          |                                                                                                                                                                             || 35%-|          |                                                                                                                                                                             || 30%-|          |                                                                                                                                                                             || 25%-|          |                                                                                                                                                                             || 20%-|          |                                                                                                                                                                             || 15%-|          |                                                                                                                                                                             || 10%-|          |                                                                                                                                                                             |+---------Warning: Some Statistics may not shown-----------------------------------------------------------------------------------------------------------------------------------------------+
也是cpu使用情况的另一种显示,其中“|”和上面的“>”原理一样,接着看 m(内存统计):

| Memory Stats ----------------------------------------------------------------------------------------------------------------------------------------                                        |
| RAM High Low Swap Page Size=4 KB |
| Total MB 503.3 0.0 503.3 1027.6 |
| Free MB 192.3 0.0 192.3 1027.6 |
| Free Percent 38.2% 0.0% 38.2% 100.0% |
| MB MB MB |
| Cached= 200.6 Active= 107.4 |
| Buffers= 46.4 Swapcached= 0.0 Inactive = 169.7 |
| Dirty = 0.2 Writeback = 0.0 Mapped = 9.5 |
| Slab = 26.8 Commit_AS = 130.0 PageTables= 1.3 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
从上面可以看见swap还100%没有使用,物理内存ram空闲38.2%,接着看 L (巨大的内存页面统计):

 Large (Huge) Page Stats -----------------------------------------------------------------------------------------------------------------------------                                        |
| There are no Huge Pages |
| - see /proc/meminfo |
| |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
这个东西我也没看太明白,接着看 j (文件系统):

| Filesystems -----------------------------------------------------------------------------------------------------------------------------------------                                        |
|Filesystem SizeMB FreeMB Use% Type MountPoint |
|/dev/sda3 48502 36378 21% ext3 / |
|/proc - - - proc not a real filesystem |
|/sys - - - sysfs not a real filesystem |
|/dev/pts - - - devpts not a real filesystem |
|/dev/sda1 99 82 12% ext3 /boot |
|/dev/shm - - - tmpfs not a real filesystem |
|/proc/sys/fs/binfmt_misc - - - binfmt_m not a real filesystem |
|/var/lib/nfs/rpc_pipefs rpc_pipe size=zero blocks! |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
这里可以看见磁盘的使用情况,非常直观,接着看 n (网络统计和错误):

| Network I/O -----------------------------------------------------------------------------------------------------------------------------------------                                        |
|I/F Name Recv=KB/s Trans=KB/s packin packout insize outsize Peak->Recv Trans |
| lo 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.2 |
| eth0 0.0 0.1 0.5 0.5 60.0 218.0 52.4 99.0 |
| sit0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 |
| Network Error Counters ------------------------------------------------------------------------------------------------------------------------------ |
|I/F Name iErrors iDrop iOverrun iFrame oErrors oDrop oOverrun oCarrier oColls |
| lo 0 0 0 0 0 0 0 0 0 |
| eth0 0 0 0 0 0 0 0 0 0 |
| sit0 0 0 0 0 0 0 0 0 0 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
接着看 N (NFS网络文件系统):

| Network Filesystem (NFS) I/O Operations per second --------------------------------------------------------------------------------------------------                                        |
| Version 2 Client Server Version 3 Client Server |
| null 0.0 0.0 null 0.0 0.0 |
| getattr 0.0 0.0 getattr 0.0 0.0 |
| setattr 0.0 0.0 setattr 0.0 0.0 |
| root 0.0 0.0 lookup 0.0 0.0 |
| lookup 0.0 0.0 access 0.0 0.0 |
| readlink 0.0 0.0 readlink 0.0 0.0 |
| read 0.0 0.0 read 0.0 0.0 |
| wrcache 0.0 0.0 write 0.0 0.0 |
| write 0.0 0.0 create 0.0 0.0 |
| create 0.0 0.0 mkdir 0.0 0.0 |
| remove 0.0 0.0 symlink 0.0 0.0 |
| rename 0.0 0.0 mknod 0.0 0.0 |
| link 0.0 0.0 remove 0.0 0.0 |
| symlink 0.0 0.0 rmdir 0.0 0.0 |
| mkdir 0.0 0.0 rename 0.0 0.0 |
| rmdir 0.0 0.0 link 0.0 0.0 |
| readdir 0.0 0.0 readdir 0.0 0.0 |
| fsstat 0.0 0.0 readdirplus 0.0 0.0 |
+---------Warning: Some Statistics may not shown-----------------------------------------------------------------------------------------------------------------------------------------------+
接着看 d  (磁盘I/O图):

| Disk I/O --/proc/diskstats----mostly in KB/s-----Warning:contains duplicates-------------------------------------------------------------------------                                        |
|DiskName Busy Read WriteKB|0 |25 |50 |75 100| |
|sda 0% 0.0 0.0|> | |
|sda1 0% 0.0 0.0|> | |
|sda2 0% 0.0 0.0|> | |
|sda3 0% 0.0 0.0|> | |
|Totals Read-MB/s=0.0 Writes-MB/s=0.0 Transfers/sec=0.0 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
接着看 D (磁盘I/O统计):

 Disk I/O --/proc/diskstats----mostly in KB/s-----Warning:contains duplicates-------------------------------------------------------------------------                                        |
|DiskName Busy Read Write Xfers Size Peak% Peak-RW InFlight |
|sda 0% 0.0 0.0KB/s 0.0 0.0KB 0% 8.0KB/s 0 | |
|sda1 0% 0.0 0.0KB/s 0.0 0.0KB 0% 0.0KB/s 0 | |
|sda2 0% 0.0 0.0KB/s 0.0 0.0KB 0% 0.0KB/s 0 | |
|sda3 0% 0.0 0.0KB/s 0.0 0.0KB 0% 8.0KB/s 0 | |
|Totals Read-MB/s=0.0 Writes-MB/s=0.0 Transfers/sec=0.0 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
接着看 o  (磁盘I/O映射):
| Disk %Busy Map --Key: @=90 #=80 X=70 8=60 O=50 0=40 o=30 +=20 -=10 .=5 _=0%--------------------------------------------------------------------------                                        ||             Disk No.  1         2         3         4         5         6                                                                                                                    ||Disks=4      0123456789012345678901234567890123456789012345678901234567890123                                                                                                                 ||disk 0 to 63 ____                                                                                                                                                                             ||----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
接着看 k  (内核内部统计):

| Kernel Stats ----------------------------------------------------------------------------------------------------------------------------------------                                        |
| RunQueue 1 Load Average CPU use since boot time |
| ContextSwitch 30228.5 1 mins 0.00 Uptime Days= 0 Hours= 2 Mins=29 |
| Forks 25.9 5 mins 0.00 Idle Days= 0 Hours= 2 Mins=28 |
| Interrupts 738415.7 15 mins 0.00 Average CPU use= 0.98% |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
接着看 V (虚拟内存和交换统计):
| Virtual-Memory --------------------------------------------------------------------------------------------------------------------------------------                                        ||nr_dirty    =        0 pgpgin      =       0                High Normal    DMA                                                                                                                ||nr_writeback=        0 pgpgout     =       0  alloc            0      0      0                                                                                                                ||nr_unstable =        0 pgpswpin    =       0  refill           0      0      0                                                                                                                ||nr_table_pgs=      330 pgpswpout   =       0  steal            0      0      0                                                                                                                ||nr_mapped   =     2438 pgfree      =       0  scan_kswapd      0      0      0                                                                                                                ||nr_slab     =     6852 pgactivate  =       0  scan_direct      0      0      0                                                                                                                ||                       pgdeactivate=       0                                                                                                                                                  ||allocstall  =        0 pgfault     =       7  kswapd_steal     =      0                                                                                                                       ||pageoutrun  =        0 pgmajfault  =       0  kswapd_inodesteal=      0                                                                                                                       ||slabs_scanned=       0 pgrotated   =       0  pginodesteal     =      0                                                                                                                       ||----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
接着看 v (详细简单的检查):

| Verbose Mode ----------------------------------------------------------------------------------------------------------------------------------------                                        |
| Code Resource Stats Now Warn Danger |
| OK -> CPU %busy 0.0% >80% >90% |
| OK -> Top Disk %busy 0.0% >40% >60% |
| |
| |
| HELP ------------------------------------------------------------------------------------------------------------------------------------------------ |
这里显示了cpu和disk的一个诊断信息,Warn(警告)/Danger(危险),可以看见cpu大于80%就是警告,大于90%就是危险,disk同理如下。

nmon工具可以收集非常详细的系统信息,不过通常用的更多的是采样信息,来生成报表!!!