SQL Server 2008性能故障排查(一)——概论

时间:2022-05-28 05:02:12

原文:SQL Server 2008性能故障排查(一)——概论

SQL Server 2008性能故障排查(一)——概论

备注:本人花了大量下班时间翻译,绝无抄袭,允许转载,但请注明出处。由于篇幅长,无法一篇博文全部说完,同时也没那么快全部翻译完,所以按章节发布。由于本人水平有限,翻译结果肯定存在问题,为了不造成误导,在每篇结尾处都附上原文,供大家参考,也希望能指出我的问题,以便改进。谢谢。

另外,本文写给稍微有经验的数据库开发人员或者DBA看,初学者可能会看不懂。在此请见谅

作者:Sunil Agarwal, Boris Baryshnikov, KeithElmore, Juergen Thomas, Kun Cheng, Burzin Patel

技术评审:Jerome Halmans, Fabricio Voznika,George Reynya

发布于:2009年3月

适用于:SQL Server 2008

概要:

有时候对一个工作负载进行劣质的数据库设计或者不正确的系统配置会引起SQLServer运行缓慢。DBA需要主动地防止或者最小化问题,并且当问题发生后,诊断问题的起因并作出正确的响应。本文提供逐步指引,使用公开可用的工具如SQLServer Profiler、性能监视器、DMV、SQLServer扩充事件数据收集器来诊断和排查常见性能问题。

版权:本部分略去,请尊重他人劳动成果即可


简介:

SQLServer偶尔运行缓慢是不常见的现象。一般原因可以归结为:对一个工作负载进行劣质的数据库设计或者不正确的系统配置。作为一个DBA,需要主动避免或者最小化问题。当问题发生时,需要去诊断起因和作出正确的对策。本白皮书提供了各种工具如SQLServer Profiler、性能监视器、DMV、SQLServer扩充事件和数据收集器来诊断和排查常见性能问题。本白皮书把问题范围限制在一些客户经常反映的地方,因为分析所有可能的问题是不现实的。

目标:

本文的主要目的是提供常规方法,主要是一些公开的、可用的工具用于诊断和故障排查。SQLServer 2008在支持性上有了重大的提升。添加了一些新动态管理视图(DMV):如sys.dm_os_memory_brokers,sys.dm_os_memory_nodes,sys.dm_exec_procedure_stats。已有(2005出现)的DMV比如:sys.dm_os_sys_info,sys.dm_exec_requests和sys.dm_exec_requests也添加了很多新的信息。你可以DMV和使用现有的工具如SQL
Server Profiler、性能监视器来收集性能相关数据用于分析。

第二个目的是介绍新的故障排查工具和2008特性,包括扩充事件(Extended Events)和数据收集器(data collector)



方法论:

SQLServer运行缓慢的原因可能有很多种,本文中根据下面3个主要症状来开始问题诊断:

  • 资源瓶颈(Resource Bottlenecks):CPU、内存和I/O瓶颈都将在本文中提及。我们不考虑网络问题。在每个资源瓶颈中,我们会描述如何标识出问题然后迭代地检查可能的原因。比如,一个内存瓶颈会引起过多的页面切换从而影响性能。
  • TempDB瓶颈:因为在每个SQLServer实例中,只有一个tempdb可被个个数据库使用,所以它可能成为性能问题和硬盘空间瓶颈。一个应用可能因为过多的DDL或者DML操作,并且耗用过多资源,会使得tempdb超负荷。这能引起非相关的、运行在同一服务器上的应用程序变得缓慢甚至运行失败。
  • 一个运行缓慢的用户查询:一个已存在的查询可能会影响性能,或者一个新的查询会耗费比想象中更多的资源。一般由以下原因引起:

1、一个现有查询的统计信息的改变会使得优化器选择一个性能低下的执行计划。

  2、丢失索引将导致强制表扫描和减慢查询速度。

  3、应用程序也会因为阻塞从而影响性能,即使资源利用情况很正常。

  4、一些不好的应用程序、不合理的架构设计或者使用了不合适的事务隔离级别,都会导致过多的阻塞。

上面的这些原因不应该分开来分析,低效的执行计划会加重系统资源的使用从而引起工作负载的性能总体下降。所以,如果一个大表丢失了一个有效的索引,或者查询优化器不选择使用这个索引,那么查询将非常慢。这些情况也同时会对I/O子系统的读操作带来很大压力,因为不得不去读取一些本来没必要或者本来已经缓存在内存中的页。类似于一个经常运行的程序过度的编译将为CPU带来压力。

  • 在SQL Server 2008中新的性能工具:SQLServer2008提供了新的工具和特性去协助你监控和故障排查。我们主要讨论的是:扩充事件和数据收集器

资源瓶颈(Resource Bottlenecks):

在接下来的部分,将讨论CPU、内存和I/O子系统资源,并且讨论在什么情况下它们会成为瓶颈(网络部分不在本文讨论范围内)。对于每个资源瓶颈,我们将讨论如何识别问他你,然后迭代地检查可能的原因。比如内存瓶颈将导致切换页面过多,从而影响性能。

在你能判断性能瓶颈之前,你必须知道在正常情况下资源是如何被利用的。你能使用本文描述的方法去收集性能基线。即在没有性能问题之前的性能数据。

你可能发现资源使用正常,但是SQLServer在目前的配置下不能支持相应的负载。为了解决这个问题,你可能不得不增加更多更强大的资源,如内存、加大你目前I/O或者网络的带宽。但是,在你执行之前,你有必要先了解资源瓶颈的常规起因。一些解决方案,如重新配置,而不一定非要增加资源。



解决资源瓶颈的工具:


下列工具中的一个或多个能在解决部分性能问题时使用到:

 性能监视器(Performance Monitor):在部分Windows 操作系统中提供,详细的了解请查阅Windows文档。

 SQLServer Profiler:在SQLServer的性能工具组中可以找到,可以查看联机丛书了解。

 DBCC命令:可以查看附录A和联机丛书了解。

 DMVs:详细可查看联机丛书。

 扩充事件(Extended Events):可以查看稍后提到的Extended Events部分和联机丛书。

 数据收集器和管理数据仓库(Data collector and the management data warehouse(MDW)):可以查看稍后提及的Data collector and MDW部分及联机丛书。

下一节:CPU瓶颈

原文:

Troubleshooting Performance Problems in SQL Server 2008

SQL Server Technical Article



Writers: Sunil Agarwal, Boris Baryshnikov, Keith Elmore, Juergen Thomas, Kun Cheng, Burzin Patel

Technical Reviewers: Jerome Halmans, Fabricio Voznika, George Reynya



Published: March 2009

Applies to: SQL Server 2008



Summary: Sometimes a poorly designed database or a system that is improperly configured for the workload can cause the slowdowns in SQL Server. Administrators need to proactively prevent or minimize problems and, when they occur, diagnose the cause and take
corrective action. This paper provides step-by-step guidelines for diagnosing and troubleshooting common performance problems by using publicly available tools such as SQL Server Profiler, Performance Monitor, dynamic management views, and SQL Server Extended
Events (Extended Events) and the data collector, which are new in SQL Server 2008.





Copyright



The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment
on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.



This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.



Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means
(electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.



Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document
does not give you any license to these patents, trademarks, copyrights, or other intellectual property.



Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address,
logo, person, place, or event is intended or should be inferred.





© 2009 Microsoft Corporation. All rights reserved.





Microsoft, MSDN, SQL Server, Win32, Windows, Windows Server, and Windows Vista are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.





All other trademarks are property of their respective owners.









Table of Contents

Introduction 1

Goals 1

Methodology 1

Resource Bottlenecks 2

Tools for Resolving Resource Bottlenecks 2

CPU Bottlenecks 3

Excessive Query Compilation and Optimization 4

Detection 5

Resolution 7

Unnecessary Recompilation 9

Detection 10

Resolution 13

Inefficient Query Plan 14

Detection 15

Resolution 15

Intraquery Parallelism 16

Detection 18

Resolution 21

Poor Cursor Usage 21

Detection 22

Resolution 23

Memory Bottlenecks 23

Background 23

Virtual Address Space and Physical Memory 23

AWE, Locked Pages, and SQL Server 23

Memory Pressures 25

Detecting Memory Pressures 26

Tools for Memory Diagnostics 26

New DMVs in SQL Server 2008 27

Resource Governor in SQL Server 2008 27

External Physical Memory Pressure 28

External Virtual Memory Pressure 30

Internal Physical Memory Pressure 30

Caches and Memory Pressure 36

Ring Buffers 37

Internal Virtual Memory Pressure 43

General Troubleshooting Steps in Case of Memory Errors 44

Memory Errors 44

I/O Bottlenecks 48

Resolution 52

tempdb 56

Monitoring tempdb Space 58

Troubleshooting Space Issues 59

User Objects 59

Version Store 60

Internal Objects 62

Excessive DDL and Allocation Operations 65

Resolution 66

Slow-Running Queries 66

Blocking 67

Locking Granularity and Lock Escalation 69

Identifying Long Blocks 71

Blocking per Object with sys.dm_db_index_operational_stats 74

Overall Performance Effect of Blocking Using Waits 75

Monitoring Index Usage 78

Extended Events 80

Data Collector and the MDW 88

Appendix A: DBCC MEMORYSTATUS Description 95

Appendix B: MDW Data Collection 96







Introduction

It’s not uncommon to experience the occasional slowdown of a database running the Microsoft® SQL Server® database software. The reasons can range from a poorly designed database to a system that is improperly configured for the workload. As an administrator,
you want to proactively prevent or minimize problems; if they occur, you want to diagnose the cause and take corrective actions to fix the problem whenever possible. This white paper provides step-by-step guidelines for diagnosing and troubleshooting common
performance problems by using publicly available tools such as SQL Server Profiler; System Monitor (in the Windows Server® 2003 operating system) or Performance Monitor (in the Windows Vista® operating system and Windows Server 2008), also known as Perfmon;
dynamic management views (sometimes referred to as DMVs); and SQL Server Extended Events (Extended Events) and the data collector, which are new in SQL Server 2008. We have limited the scope of this white paper to the problems commonly seen by Microsoft Customer
Service and Support, because an exhaustive analysis of all possible problems is not feasible.

Goals

The primary goal of this paper is to provide a general methodology for diagnosing and troubleshooting SQL Server performance problems in common customer scenarios by using publicly available tools.

SQL Server 2008 has made great strides in supportability. New dynamic management views (DMVs) have been added, like sys.dm_os_memory_brokers, sys.dm_os_memory_nodes, and sys.dm_exec_procedure_stats. Existing DMVs such as sys._dm_os_sys_info, sys.dm_exec_requests,
and sys.dm_exec_requests have been enriched with additional information. You can use DMVs and existing tools, like SQL Server Profiler and Performance Monitor, to collect performance related data for analysis.

The secondary goal of this paper is to introduce new troubleshooting tools and features in SQL Server 2008, including Extended Events and the data collector.

Methodology

There can be many reasons for a slowdown in SQL Server. We use the following three key symptoms to start diagnosing problems:

• Resource bottlenecks: CPU, memory, and I/O bottlenecks are covered in this paper. We do not consider network issues. For each resource bottleneck, we describe how to identify the problem and then iterate through the possible causes. For example, a memory
bottleneck can lead to excessive paging that ultimately impacts performance.

• tempdb bottlenecks: Because there is only one tempdb for each SQL Server instance, it can be a performance and a disk space bottleneck. An application can overload tempdb through excessive DDL or DML operations and by taking too much space. This can cause
unrelated applications running on the server to slow down or fail.

• A slow-running user query: The performance of an existing query might regress, or a new query might appear to be taking longer than expected. There can be many reasons for this. For example:

o Changes in statistical information can lead to a poor query plan for an existing query.

o Missing indexes can force table scans and slow down the query.

o An application can slow down due to blocking even if resource utilization is normal.

o Excessive blocking can be due to poor application or schema design or the choice of an improper isolation level for the transaction.

The causes of these symptoms are not necessarily independent of each other. The poor choice of a query plan can tax system resources and cause an overall slowdown of the workload. So, if a large table is missing a useful index, or if the query optimizer decides
not to use it, the query can slow down; these conditions also put heavy pressure on the I/O subsystem to read the unnecessary data pages and on the memory (buffer pool) to store these pages in the cache. Similarly, excessive recompilation of a frequently-run
query can put pressure on the CPU.

New Performance Tools in SQL Server 2008

SQL Server 2008 introduced new features and tools that you can use to monitor and troubleshoot performance problems. We’ll discuss two features: Extended Events and the data collector.

Resource Bottlenecks

The next sections of this paper discuss CPU, memory, and I/O subsystem resources and how these can become bottlenecks. (Network issues are outside of the scope of this paper.) For each resource bottleneck, we describe how to identify the problem and then iterate
through the possible causes. For example, a memory bottleneck can lead to excessive paging, which can ultimately impact performance.

Before you can determine whether you have a resource bottleneck, you need to know how resources are used under normal circumstances. You can use the methods outlined in this paper to collect baseline information about the use of the resource (at a time when
you are not having performance problems).

You might find that the problem is a resource that is running near capacity and that SQL Server cannot support the workload in its current configuration. To address this issue, you may need to add more processing power or memory, or you may need to increase
the bandwidth of your I/O or network channel. However, before you take that step, it is useful to understand some common causes of resource bottlenecks. Some solutions, such as reconfiguration, do not require the addition of more resources.

Tools for Resolving Resource Bottlenecks

One or more of the following tools can be used to resolve a particular resource bottleneck:

• Performance Monitor: This tool is available as part of the Windows® operating system. For more information, see your Windows documentation.

• SQL Server Profiler: See SQL Server Profiler in the Performance Tools group in the SQL Server 2008 program group. For more information, see SQL Server 2008 Books Online.

• DBCC commands: For more information, see SQL Server 2008 Books Online and Appendix A.

• DMVs: For more information, see SQL Server 2008 Books Online.

• Extended Events: For more information, see Extended Events later in this paper and SQL Server 2008 Books Online.

• Data collector and the management data warehouse (MDW): For more information, see Data Collector and the MDW later in this paper and SQL Server 2008 Books Online.