基础监控:
Processor:
% Processor Time CPU当前利用率,百分比
Memory:
Available MBytes 当前可用内存,兆字节(虚拟内存不需要监控,只有当物理内存不够时才会使用虚拟内存,物理内存已有监控)
LogicalDisk:
% Free Space 逻辑分区可用空间,百分比(物理磁盘IO由于RAID级别不同,或者有的机器没有RAID,无法定义统一的监控阈值)
Network Interface:
Bytes Total/sec 网卡流量:发送+接收,字节
TCPv4:
Connections Established 当前连接数(Established + Close-Wait)
==================================================
CPU:
%Processor Time
%Priviliaged Time
CPU在特权模式下处理线程所花的时间百分比。一般的系统服务,进城管理,内存管理等一些由操作系统自行启动的进程属于这类
%User Time
与%Privileged Time计数器正好相反,指的是在用户状态模式下(即非特权模式)的操作所花的时间百分比。如果该值较大,可以考虑是否通过算法优化等方法降低这个值。如果该服务器是数据库服务器,导致此值较大的原因很可能是数据库的排序或是函数操作消耗了过多的CPU时间,此时可以考虑对数据库系统进行优化。
%DPC Time
处理器在网络处理上消耗的时间,该值越低越好。在多处理器系统中,如果这个值大于50%并且%Processor Time非常高,加入一个网卡可能会提高性能。
Memory:
Available Bytes
Pages/sec
该计数器显示由于页面不在物理内存中而需要从磁盘读取的页面数。Pages/sec 的值很大不一定表明内存有问题,而可能是运行使用内存映射文件的程序所致,操作系统经常会利用磁盘交换的方式提高系统可用的内存量或是提高内存的使用效率。(注意该计数器与 Page Faults/sec 的区别,后者只表明数据不能在内存的指定工作集中立即使用,包括硬错误和软错误)
Page Faults/sec计数器可以确保磁盘活动不是由分页导致的。在 Windows 中,换页的原因包括:配置进程占用了过多内存 或者 文件系统活动。
如果在同一硬盘上有多个逻辑分区,需要使用 Logical Disk计数器而非 Physical Disk计数器。查看逻辑磁盘计数器有助于确定哪些文件被频繁访问。当发现磁盘有大量读/写活动时,请查看读写专用计数器以确定导致每个逻辑卷负荷增加的磁盘活动类型,例如,Logical Disk: Disk Write Bytes/sec。
Page Input/sec
表示为了解决硬错误而写入硬盘的页数(参考值:>=Page Reads/sec)
Page Reads/sec
表示为了解决硬错误而从硬盘上读取的页数。(参考值: <=5)
如果怀疑有内存泄露,请监视 Memory/Available Bytes 和 Memory/ Committed Bytes,以观察内存行为,并监视你认为可能在泄露内存的进程的 Process/ Private Bytes、Process/ Working Set 和Process/ Handle Count。如果怀疑是内核模式进程导致了泄露,则还应该监视 Memory/ Pool Nonpaged Bytes、Memory/ Pool Nonpaged Allocs 和 Process(process_name)/ Pool Nonpaged Bytes
如果发生了内存泄漏,process\private bytes计数器和process\working set 计数器的值往往会升高,同时avaiable bytes的值会降低
private Bytes
是指进程所分配的无法与其他进程共享的当前字节数量。该计数器主要用来判断进程在性能测试过程中有无内存泄漏。
例如:对于一个IIS之上的web应用,我们可以重点监控inetinfo进程的Private Bytes,如果在性能测试过程中,该进程的Private Bytes计数器值不断增加,或是性能测试停止后一段时间,该进程的Private Bytes仍然持续在高水平,则说明应用存在内存泄漏。
Disk:
PhysicalDisk\Avg. Disk sec/Read
以秒计算的在此盘上读取数据的所需平均时间。
Physical Disk\ Disk Reads/sec
在读取操作时从磁盘上传送的字节平均数。
PhysicalDisk\ Avg. Disk
sec/Write
以秒计算的在此盘上写入数据的所需平均时间。
Physical Disk\ DiskWrites/sec
在写入操作时从磁盘上传送的字节平均数。
Physical Disk\ Avg.Disk sec/Transfer
反映磁盘完成请求所用的时间。较高的值表明磁盘控制器由于失败而不断重试该磁盘。这些故障会增加平均磁盘传送时间。
%Disk Time和Avg.Disk Queue Length
RAID 磁盘中的 % Disk Time 计数器会指示大于 100% 的值。如果出现这种情况,则使用 PhysicalDisk: Avg.Disk Queue Length计数器来确定等待进行磁盘访问的平均系统请求数量。
如果不是RAID,则使用 % Disk Time 和 Current Disk Queue Length计数器确定是否磁盘存在瓶颈,如果这两个计数器的值一直很高,则可能是磁盘存在瓶颈
Physical Disk:
DiskTransfers/sec 磁盘IOPS
% Disk Time 当前物理磁盘利用率,如果是RAID,该值会大于100%
Current Disk Queue Length 等待进行磁盘访问的当前系统请求数量
Avg.Disk Queue Length 等待进行磁盘访问的平均系统请求数量,用于RAID
Disk counters to monitor
Monitor the following counters to ensure the health of disks. Note that the following values represent values measured over time — not values that occur during a sudden spike and not values that are based on a single measurement.
Physical Disk: % Disk Time: DataDrive This counter shows the percentage of elapsed time that the selected disk drive is busy servicing read or write requests. Monitor this counter to ensure that it remains less than two times the number of disks.
Logical Disk: Disk Transfers/sec This counter shows the rate at which read and write operations are performed on the disk. Use this counter to monitor growth trends and forecast appropriately.
Logical Disk: Disk Read Bytes/sec and Logical Disk: Disk Write Bytes/sec These counters show the rate at which bytes are transferred from the disk during read or write operations.
Logical Disk: Avg. Disk Bytes/Read This counter shows the average number of bytes transferred from the disk during read operations. This value can reflect disk latency — larger read operations can result in slightly increased latency.
Logical Disk: Avg. Disk Bytes/Write This counter shows the average number of bytes transferred to the disk during write operations. This value can reflect disk latency — larger write operations can result in slightly increased latency.
Logical Disk: Current Disk Queue Length This counter shows the number of requests outstanding on the disk at the time that the performance data is collected. For this counter, lower values are better. Values above 2 per disk may indicate a bottleneck and should be investigated. This means that a value of up to 8 may be acceptable for a LUN comprised of 4 disks. Bottlenecks can create a backlog that can spread beyond the current server that is accessing the disk, and result in long wait times for users. Possible solutions to a bottleneck are to add more disks to the RAID array, replace existing disks with faster disks, or move some data to other disks.
Logical Disk: Avg. Disk Queue Length This counter shows the average number of both read and write requests that were queued for the selected disk during the sample interval. The rule is that there should be two or fewer outstanding read and write requests per spindle, but this can be difficult to measure because of storage virtualization and differences in RAID levels between configurations. Look for larger than average disk queue lengths in combination with larger than average disk latencies. This combination can indicate that the storage array cache is being overused or that spindle sharing with other applications is affecting performance.
-
Logical Disk: Avg. Disk sec/Read and Logical Disk: Avg. Disk sec/Write These counters show the average time, in seconds, of a read or write operation to the disk. Monitor these counters to ensure that they remain below 85 percent of the disk capacity. Disk access time increases exponentially if read or write operations are more than 85 percent of disk capacity. To determine the specific capacity for your hardware, refer to the vendor documentation, or use the SQLIO Disk Subsystem Benchmark Tool to calculate it. For more information, see SQLIO Disk Subsystem Benchmark Tool(http://go.microsoft.com/fwlink/?LinkID=105586).
Logical Disk: Avg. Disk sec/Read This counter shows the average time, in seconds, of a read operation from the disk. On a well-tuned system, ideal values are from 1-5 milliseconds (ms) for logs (ideally 1 ms on a cached array), and 4-20 ms for data (ideally less than 10 ms). Higher latencies can occur during peak times, but if high values occur regularly, you should investigate the cause.
Logical Disk: Avg. Disk sec/Write This counter shows the average time, in seconds, of a write operation to the disk. On a well-tuned system, ideal values are from 1-5 ms for logs (ideally 1 ms on a cached array), and 4-20 ms for data (ideally less than 10 ms). Higher latencies can occur during peak times, but if high values occur regularly, you should investigate the cause.
When you are using RAID configurations with the Avg. Disk sec/Read or Avg. Disk sec/Write, use the formulas listed in the following table to determine the rate of input and output on the disk.
RAID level Formula RAID 0
I/Os per disk = (reads + writes) / number of disks
RAID 1
I/Os per disk = [reads + (2 * writes)] / 2
RAID 5
I/Os per disk = [reads + (4 * writes)] / number of disks
RAID 10
I/Os per disk = [reads + (2 * writes)] / number of disks
For example, if you have a RAID 1 system that has two physical disks, and your counters are at the values that are shown in the following table:
Counter Value Avg. Disk sec/Read
80
Logical Disk: Avg. Disk sec/Write
70
Avg. Disk Queue Length
5
The I/O value per disk can be calculated as follows: (80 + (2 * 70))/2 = 110
The disk queue length can be calculated as follows: 5/2 = 2.5
In this situation, you have a borderline I/O bottleneck.
- From:http://technet.microsoft.com/en-us/library/dd723635(v=office.12).aspx