在SQL Server上监视什么

时间:2023-01-13 04:07:41

I have been asked to monitor SQL Server (2005 & 2008) and am wondering what are good metrics to look at? I can access WMI counters but am slightly lost as to how much depth is going to be useful.

我被要求监控SQL Server(2005和2008),我想知道什么是好的指标?我可以访问WMI计数器,但是有多少深度会有用。

Currently I have on my list:


  • user connections
  • logins per second
  • 每秒登录

  • latch waits per second
  • 闩锁每秒等待

  • total latch wait time
  • 总锁存等待时间

  • dead locks per second
  • 每秒死锁

  • errors per second
  • 每秒错误

  • Log and data file sizes
  • 日志和数据文件大小

I am looking to be able to monitor values that will indicate a degradation of performance on the machine or a potential serious issue. To this end I am also wondering at what values some of these things would be considered normal vs problematic?


As I reckon it would probably be a really good question to have answered for the general community I thought I'd court some of you DBA experts out there (I am certainly not one of them!)


Apologies if a rather open ended question. Ry

如果一个相当开放的问题,请道歉。 RY

7 个解决方案



I would also monitor page life expectancy and your buffer cache hit ratio, see Use sys.dm_os_performance_counters to get your Buffer cache hit ratio and Page life expectancy counters for details




Late answer but can be of interest to other readers


One of my colleagues had the similar problem, and used this thread to help get him started. He also ran into a blog post describing common causes of performance issues and an instruction on what metrics should be monitored, beside ones already mentioned here. These other metrics are:


• %Disk Time:


This counter indicates a disk problem, but must be observed in conjunction with the Current Disk Queue Length counter to be truly informative. Recall also that the disk could be a bottleneck prior to the %Disk Time reaching 100%.


• %Disk Read Time and the %Disk Write Time:


The %Disk Read Time and %Disk Write Time metrics are similar to %Disk Time, just showing the operations read from or written to disk, respectively. They are actually the Average Disk Read Queue Length and Average Disk Write Queue Length values presented in percentages.

%Disk Read Time和%Disk Write Time指标类似于%Disk Time,仅显示从磁盘读取或写入磁盘的操作。它们实际上是以百分比表示的平均磁盘读取队列长度和平均磁盘写入队列长度值。

• %Idle Time:

• %空闲时间:

Measures the percentage of time the disk was idle during the sample interval. If this counter falls below 20 percent, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.


• %Free Space:

• %可用空间:

Measures the percentage of free space on the selected logical disk drive. Take note if this falls below 15 percent, as you risk running out of free space for the OS to store critical files. One obvious solution here is to add more disk space.


If you would like to read the whole post, you may find it here: http://www.sqlshack.com/sql-server-disk-performance-metrics-part-2-important-disk-performance-measures/




Use SQL Profiler to identify your Top 10 (or more) queries. Create a baseline performance for these queries. Review current average execution times vs. your baseline, and alert if significantly above your baseline. You can also use this list to identify queries for possible optimization.


This attacks the problem at a higher level than just reviewing detailed stats, although those stats can also be useful. I have found this approach to work on any DBMS, including MySQL and Oracle. If your top query times start to go up, you can bet you are starting to run into performance issues, which you can then start to drill into in more detail.




Budget permitting, it's worth looking at some 3rd party tools to help. We use Idera's SQL Diagnostic Manager to monitor server health and Confio's Ignite to keep an eye on query performance. Both products have served us well in our shop.




Percent CPU utilization and Average disk queue lengths are also pretty standard. CPUs consistently over 80% indicates you may need more or better CPUs (and servers to house them); Consistently over 2 on any disk queue indicates you have a disk I/O bottleneck on that drive.

CPU利用率百分比和平均磁盘队列长度也非常标准。 CPU持续超过80%表示您可能需要更多或更好的CPU(以及用于容纳它们的服务器);在任何磁盘队列上始终超过2表示您在该驱动器上有磁盘I / O瓶颈。



You Should monitor the total pages allocated to a particular process. You can get that information from querying the sys databases.


  sys.dm_exec_sessions s
   LEFT  JOIN sys.dm_exec_connections c
        ON  s.session_id = c.session_id
   LEFT JOIN sys.dm_db_task_space_usage tsu
        ON  tsu.session_id = s.session_id
   LEFT JOIN sys.dm_os_tasks t
        ON  t.session_id = tsu.session_id
        AND t.request_id = tsu.request_id
   LEFT JOIN sys.dm_exec_requests r
        ON  r.session_id = tsu.session_id
        AND r.request_id = tsu.request_id
   OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) TSQL

The following post explains really well how you can use it to monitor you server when nothing works http://tsqltips.blogspot.com/2012/06/monitor-current-sql-server-processes.html




Besides the performance metrics suggested above, I strongly recommend monitoring available memory, Batch Requests/sec, SQL Compilations/sec, and SQL Recompilations/sec. All are available in the sys.dm_os_performance_counters view and in Windows Performance Monitor.

除了上面建议的性能指标外,我强烈建议监视可用内存,Batch Requests / sec,SQL Compilations / sec和SQL Recompilations / sec。所有这些都可以在sys.dm_os_performance_counters视图和Windows性能监视器中找到。

As for

ideally I'd like to organise monitored items into 3 categories, say 'FYI', 'Warning' & 'Critical'


There are many third party monitoring tools that enable you to create alerts of different severity level, so once you determine what to monitor and what are recommended values for your environment, you can set low, medium, and high alerts.


Check Brent Ozar's article on not so useful metrics here.

请查看Brent Ozar关于不太有用的指标的文章。



I would also monitor page life expectancy and your buffer cache hit ratio, see Use sys.dm_os_performance_counters to get your Buffer cache hit ratio and Page life expectancy counters for details




Late answer but can be of interest to other readers


One of my colleagues had the similar problem, and used this thread to help get him started. He also ran into a blog post describing common causes of performance issues and an instruction on what metrics should be monitored, beside ones already mentioned here. These other metrics are:


• %Disk Time:


This counter indicates a disk problem, but must be observed in conjunction with the Current Disk Queue Length counter to be truly informative. Recall also that the disk could be a bottleneck prior to the %Disk Time reaching 100%.


• %Disk Read Time and the %Disk Write Time:


The %Disk Read Time and %Disk Write Time metrics are similar to %Disk Time, just showing the operations read from or written to disk, respectively. They are actually the Average Disk Read Queue Length and Average Disk Write Queue Length values presented in percentages.

%Disk Read Time和%Disk Write Time指标类似于%Disk Time,仅显示从磁盘读取或写入磁盘的操作。它们实际上是以百分比表示的平均磁盘读取队列长度和平均磁盘写入队列长度值。

• %Idle Time:

• %空闲时间:

Measures the percentage of time the disk was idle during the sample interval. If this counter falls below 20 percent, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.


• %Free Space:

• %可用空间:

Measures the percentage of free space on the selected logical disk drive. Take note if this falls below 15 percent, as you risk running out of free space for the OS to store critical files. One obvious solution here is to add more disk space.


If you would like to read the whole post, you may find it here: http://www.sqlshack.com/sql-server-disk-performance-metrics-part-2-important-disk-performance-measures/




Use SQL Profiler to identify your Top 10 (or more) queries. Create a baseline performance for these queries. Review current average execution times vs. your baseline, and alert if significantly above your baseline. You can also use this list to identify queries for possible optimization.


This attacks the problem at a higher level than just reviewing detailed stats, although those stats can also be useful. I have found this approach to work on any DBMS, including MySQL and Oracle. If your top query times start to go up, you can bet you are starting to run into performance issues, which you can then start to drill into in more detail.




Budget permitting, it's worth looking at some 3rd party tools to help. We use Idera's SQL Diagnostic Manager to monitor server health and Confio's Ignite to keep an eye on query performance. Both products have served us well in our shop.




Percent CPU utilization and Average disk queue lengths are also pretty standard. CPUs consistently over 80% indicates you may need more or better CPUs (and servers to house them); Consistently over 2 on any disk queue indicates you have a disk I/O bottleneck on that drive.

CPU利用率百分比和平均磁盘队列长度也非常标准。 CPU持续超过80%表示您可能需要更多或更好的CPU(以及用于容纳它们的服务器);在任何磁盘队列上始终超过2表示您在该驱动器上有磁盘I / O瓶颈。



You Should monitor the total pages allocated to a particular process. You can get that information from querying the sys databases.


  sys.dm_exec_sessions s
   LEFT  JOIN sys.dm_exec_connections c
        ON  s.session_id = c.session_id
   LEFT JOIN sys.dm_db_task_space_usage tsu
        ON  tsu.session_id = s.session_id
   LEFT JOIN sys.dm_os_tasks t
        ON  t.session_id = tsu.session_id
        AND t.request_id = tsu.request_id
   LEFT JOIN sys.dm_exec_requests r
        ON  r.session_id = tsu.session_id
        AND r.request_id = tsu.request_id
   OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) TSQL

The following post explains really well how you can use it to monitor you server when nothing works http://tsqltips.blogspot.com/2012/06/monitor-current-sql-server-processes.html




Besides the performance metrics suggested above, I strongly recommend monitoring available memory, Batch Requests/sec, SQL Compilations/sec, and SQL Recompilations/sec. All are available in the sys.dm_os_performance_counters view and in Windows Performance Monitor.

除了上面建议的性能指标外,我强烈建议监视可用内存,Batch Requests / sec,SQL Compilations / sec和SQL Recompilations / sec。所有这些都可以在sys.dm_os_performance_counters视图和Windows性能监视器中找到。

As for

ideally I'd like to organise monitored items into 3 categories, say 'FYI', 'Warning' & 'Critical'


There are many third party monitoring tools that enable you to create alerts of different severity level, so once you determine what to monitor and what are recommended values for your environment, you can set low, medium, and high alerts.


Check Brent Ozar's article on not so useful metrics here.

请查看Brent Ozar关于不太有用的指标的文章。