在SQL Server上监视什么

I have been asked to monitor SQL Server (2005 & 2008) and am wondering what are good metrics to look at? I can access WMI counters but am slightly lost as to how much depth is going to be useful.

我被要求监控SQL Server(2005和2008),我想知道什么是好的指标?我可以访问WMI计数器,但是有多少深度会有用。

Currently I have on my list:

目前我在我的名单上:

user connections
logins per second

每秒登录

latch waits per second

闩锁每秒等待

total latch wait time

总锁存等待时间

dead locks per second

每秒死锁

errors per second

每秒错误

Log and data file sizes

日志和数据文件大小

I am looking to be able to monitor values that will indicate a degradation of performance on the machine or a potential serious issue. To this end I am also wondering at what values some of these things would be considered normal vs problematic?

我希望能够监控表明机器性能下降或潜在严重问题的值。为此,我也想知道这些东西会被认为是正常的还是有问题的是什么价值?

As I reckon it would probably be a really good question to have answered for the general community I thought I'd court some of you DBA experts out there (I am certainly not one of them!)

我认为对于一般社区来说可能是一个非常好的问题,我认为我会告诉你们有些DBA专家(我肯定不是其中之一!)

Apologies if a rather open ended question. Ry

如果一个相当开放的问题,请道歉。 RY

7 个解决方案

#1

I would also monitor page life expectancy and your buffer cache hit ratio, see Use sys.dm_os_performance_counters to get your Buffer cache hit ratio and Page life expectancy counters for details

我还会监视页面预期寿命和缓冲区缓存命中率,请参阅使用sys.dm_os_performance_counters获取缓冲区缓存命中率和页面预期寿命计数器以获取详细信息

#2

Late answer but can be of interest to other readers

迟到的答案,但其他读者可能会感兴趣

One of my colleagues had the similar problem, and used this thread to help get him started. He also ran into a blog post describing common causes of performance issues and an instruction on what metrics should be monitored, beside ones already mentioned here. These other metrics are:

我的一位同事遇到了类似的问题,并使用这个帖子来帮助他开始。他还发表了一篇博客文章,描述了性能问题的常见原因以及应该监控哪些指标的指示,除了这里已经提到过的指标。这些其他指标是:

• %Disk Time:

•%磁盘时间:

This counter indicates a disk problem, but must be observed in conjunction with the Current Disk Queue Length counter to be truly informative. Recall also that the disk could be a bottleneck prior to the %Disk Time reaching 100%.

此计数器指示磁盘问题,但必须与当前磁盘队列长度计数器一起观察才能真正提供信息。还记得磁盘可能是%磁盘时间达到100%之前的瓶颈。

• %Disk Read Time and the %Disk Write Time:

•%磁盘读取时间和%磁盘写入时间:

The %Disk Read Time and %Disk Write Time metrics are similar to %Disk Time, just showing the operations read from or written to disk, respectively. They are actually the Average Disk Read Queue Length and Average Disk Write Queue Length values presented in percentages.

%Disk Read Time和%Disk Write Time指标类似于%Disk Time,仅显示从磁盘读取或写入磁盘的操作。它们实际上是以百分比表示的平均磁盘读取队列长度和平均磁盘写入队列长度值。

• %Idle Time:

• %空闲时间:

Measures the percentage of time the disk was idle during the sample interval. If this counter falls below 20 percent, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.

测量采样间隔期间磁盘空闲的时间百分比。如果此计数器低于20%,则磁盘系统已饱和。您可以考虑用更快的磁盘系统替换当前的磁盘系统。

• %Free Space:

• %可用空间:

Measures the percentage of free space on the selected logical disk drive. Take note if this falls below 15 percent, as you risk running out of free space for the OS to store critical files. One obvious solution here is to add more disk space.

测量所选逻辑磁盘驱动器上的可用空间百分比。请注意,如果这低于15%,因为您可能会耗尽操作系统的可用空间来存储关键文件。这里一个明显的解决方案是添加更多磁盘空间。

If you would like to read the whole post, you may find it here: http://www.sqlshack.com/sql-server-disk-performance-metrics-part-2-important-disk-performance-measures/

如果您想阅读整篇文章,可以在此处找到:http://www.sqlshack.com/sql-server-disk-performance-metrics-part-2-important-disk-performance-measures/

#3

Use SQL Profiler to identify your Top 10 (or more) queries. Create a baseline performance for these queries. Review current average execution times vs. your baseline, and alert if significantly above your baseline. You can also use this list to identify queries for possible optimization.

使用SQL事件探查器识别您的前10个(或更多)查询。为这些查询创建基准性能。查看当前平均执行时间与基准的关系,并在显着高于基线时发出警报。您还可以使用此列表来标识可能的优化查询。

This attacks the problem at a higher level than just reviewing detailed stats, although those stats can also be useful. I have found this approach to work on any DBMS, including MySQL and Oracle. If your top query times start to go up, you can bet you are starting to run into performance issues, which you can then start to drill into in more detail.

除了查看详细的统计数据之外,这还可以解决问题,尽管这些统计数据也很有用。我发现这种方法适用于任何DBMS,包括MySQL和Oracle。如果您的最高查询时间开始上升,您可以打赌您开始遇到性能问题,然后您可以开始深入研究这些问题。

#4

Budget permitting, it's worth looking at some 3rd party tools to help. We use Idera's SQL Diagnostic Manager to monitor server health and Confio's Ignite to keep an eye on query performance. Both products have served us well in our shop.

预算许可,值得一看第三方工具来帮助。我们使用Idera的SQL诊断管理器来监控服务器运行状况,使用Confio的Ignite来监控查询性能。这两种产品在我们的商店中都很好用。

#5

Percent CPU utilization and Average disk queue lengths are also pretty standard. CPUs consistently over 80% indicates you may need more or better CPUs (and servers to house them); Consistently over 2 on any disk queue indicates you have a disk I/O bottleneck on that drive.

CPU利用率百分比和平均磁盘队列长度也非常标准。 CPU持续超过80%表示您可能需要更多或更好的CPU(以及用于容纳它们的服务器);在任何磁盘队列上始终超过2表示您在该驱动器上有磁盘I / O瓶颈。

#6

You Should monitor the total pages allocated to a particular process. You can get that information from querying the sys databases.

您应该监视分配给特定进程的总页数。您可以从查询sys数据库中获取该信息。

  sys.dm_exec_sessions s
   LEFT  JOIN sys.dm_exec_connections c
        ON  s.session_id = c.session_id
   LEFT JOIN sys.dm_db_task_space_usage tsu
        ON  tsu.session_id = s.session_id
   LEFT JOIN sys.dm_os_tasks t
        ON  t.session_id = tsu.session_id
        AND t.request_id = tsu.request_id
   LEFT JOIN sys.dm_exec_requests r
        ON  r.session_id = tsu.session_id
        AND r.request_id = tsu.request_id
   OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) TSQL

The following post explains really well how you can use it to monitor you server when nothing works http://tsqltips.blogspot.com/2012/06/monitor-current-sql-server-processes.html

以下帖子很好地解释了当没有任何工作时如何使用它监视服务器http://tsqltips.blogspot.com/2012/06/monitor-current-sql-server-processes.html

#7

Besides the performance metrics suggested above, I strongly recommend monitoring available memory, Batch Requests/sec, SQL Compilations/sec, and SQL Recompilations/sec. All are available in the sys.dm_os_performance_counters view and in Windows Performance Monitor.

除了上面建议的性能指标外,我强烈建议监视可用内存,Batch Requests / sec,SQL Compilations / sec和SQL Recompilations / sec。所有这些都可以在sys.dm_os_performance_counters视图和Windows性能监视器中找到。

As for

ideally I'd like to organise monitored items into 3 categories, say 'FYI', 'Warning' & 'Critical'

理想情况下,我想将受监控的项目分为3类,比如'FYI','Warning'和'Critical'

There are many third party monitoring tools that enable you to create alerts of different severity level, so once you determine what to monitor and what are recommended values for your environment, you can set low, medium, and high alerts.

有许多第三方监视工具可用于创建不同严重性级别的警报,因此,一旦确定要监视的内容以及环境的建议值,就可以设置低,中和高警报。

Check Brent Ozar's article on not so useful metrics here.

请查看Brent Ozar关于不太有用的指标的文章。

#1