减少SQL Server表碎片而不添加/删除集群索引?

I have a large database (90GB data, 70GB indexes) that's been slowly growing for the past year, and the growth/changes has caused a large amount of internal fragmentation not only of the indexes, but of the tables themselves.

我有一个很大的数据库(90GB的数据，70GB的索引)，在过去的一年中一直在缓慢增长，增长/变化导致了大量的内部碎片，不仅是索引，而且是表本身。

It's easy to resolve the (large number of) very fragmented indexes - a REORGANIZE or REBUILD will take care of that, depending on how fragmented they are - but the only advice I can find on cleaning up actual table fragmentation is to add a clustered index to the table. I'd immediately drop it afterwards, as I don't want a clustered index on the table going forward, but is there another method of doing this without the clustered index? A "DBCC" command that will do this?

很容易解析(大量)非常碎片化的索引——重新组织或重新构建会处理这个问题，这取决于它们的碎片化程度——但是对于清理实际的表碎片，我能找到的唯一建议是向表添加集群索引。之后我将立即删除它，因为我不希望在表上继续使用聚集索引，但是是否有其他方法可以在没有聚集索引的情况下这样做?一个“DBCC”命令可以做到这一点吗?

Thanks for your help.

谢谢你的帮助。

5 个解决方案

#1

Problem

Let's get some clarity, because this is a common problem, a serious issue for every company using SQL Server.

让我们澄清一下，因为这是一个常见的问题，对于每个使用SQL Server的公司来说都是一个严重的问题。

This problem, and the need for CREATE CLUSTERED INDEX, is misunderstood.

这个问题以及创建集群索引的需要被误解了。

Agreed that having a permanent Clustered Index is better than not having one. But that is not the point, and it will lead into a long discussion anyway, so let's set that aside and focus on the posted question.

一致认为拥有一个永久聚集索引比没有索引要好。但这不是重点，它将导致长时间的讨论，所以让我们把它放在一边，把注意力集中在已发布的问题上。

The point is, you have substantial fragmentation on the Heap. You keep calling it a "table", but there is no such thing at the physical data storage or DataStructure level. A table is a logical concept, not a physical one. It is a collection of physical DataStructures. The collection is one of two possibilities:

关键是，您在堆上有大量的碎片。您一直将其称为“表”，但是在物理数据存储或数据结构级别上并没有这样的东西。表是一个逻辑概念，而不是物理概念。它是物理数据结构的集合。收集是两种可能性之一:

Heap
plus all Non-clustered Indices
plus Text/Image chains

堆加上所有非聚集索引加上文本/图像链
or a Clustered Index
(eliminates the Heap and one Non-clustered Index)
plus all Non-clustered Indices
plus Text/Image chains.

或者集群索引(消除堆和一个非集群索引)加上所有非集群索引以及文本/图像链。

Heaps get badly fragmented; the more interspersed (random)Insert/Deletes/Updates there are, the more fragmentation.

堆严重分散;插入(随机)插入/删除/更新越多，碎片就越多。

There is no way to clean up the Heap, as is. MS does not provide a facility (other vendors do).

没有办法像现在这样清理堆。MS不提供设备(其他供应商提供)。

Solution

However, we know that Create Clustered Index rewrites and re-orders the Heap, completely. The method (not a trick), therefore, is to Create Clustered Index only for the purpose of de-fragmenting the Heap, and drop it afterward. You need free space in the db of table_size x 1.25.

但是，我们知道Create聚集索引完全重写并重新排序堆。因此，该方法(不是一个技巧)是创建聚集索引，仅用于对堆进行碎片整理，然后将其删除。在table_size x 1.25的db中需要空闲空间。

While you are at it, by all means, use FILLFACTOR, to reduce future fragmentation. The Heap will then take more allocated space, allowing for future Inserts, Deletes and row expansions due to Updates.

当您正在使用时，请务必使用FILLFACTOR，以减少将来的碎片。然后，堆将占用更多分配的空间，以便将来由于更新而进行插入、删除和行扩展。

Note

Note that there are three Levels of Fragmentation; this deals with Level III only, fragmentation within the Heap, which is caused by Lack of a Clustered Index

注意，碎片化有三个层次;这只处理级别III，堆内的碎片，这是由于缺少聚集索引造成的。
As a separate task, at some other time, you may wish to contemplate the implementation of a permanent Clustered Index, which eliminates fragmentation altogether ... but that is separate to the posted problem.

作为一项单独的任务，在其他时候，您可能希望考虑实现一个永久性的集群索引，它可以完全消除碎片……但这与发布的问题是分开的。

Response to Comment

SqlRyan:
While this doesn't give me a magic solution to my problem, it makes pretty clear that my problem is a result of a SQL Server limitation and adding a clustered index is the only way to "defragment" the heap.

SqlRyan:虽然这并没有给我提供一个解决问题的神奇解决方案，但很明显，我的问题是SQL服务器限制的结果，添加集群索引是“碎片整理”堆的唯一方法。

Not quite. I wouldn't call it a "limitation".

不完全是。我不会称之为“限制”。

The method I have given to eliminate the Fragmentation in the Heap is to create a Clustered Index, and then drop it. Ie. temporarily, the only purpose of which is correct the Fragmentation.

我给出的消除堆中碎片的方法是创建集群索引，然后删除它。Ie。暂时而言，其唯一目的是纠正碎片化。
Implementing a Clustered Index on the table (permanently) is a much better solution, because it reduces overall Fragmentation (the DataStructure can still get Fragmented, refer detailed info in links below), which is far less than the Fragmentation that occurs in a Heap.

在表上(永久地)实现集群索引是一种更好的解决方案，因为它减少了整体碎片(数据结构仍然可以得到碎片，请参阅下面链接中的详细信息)，这比堆中出现的碎片要少得多。
- Every table in a Relational database (except "pipe" or "queue" tables) should have a Clustered Index, in order to take advantage of its various benefits.
  
  关系数据库中的每个表(除了“管道”或“队列”表)都应该有一个聚集索引，以便利用它的各种优点。
- The Clustered Index should be on columns that distribute the data (avoiding INSERT conflicts), never be indexed on a monotonically increasing column, such as Record ID ¹, which guarantees an INSERT Hot Spot in the last Page.
  
  聚集索引应该在分布数据的列上(避免插入冲突)，永远不要在单调递增的列上建立索引，比如记录ID 1，它保证在最后一页中插入一个热点。

^{1. Record IDs on every File renders your "database" a non-relational Record Filing System, using SQL merely for convenience. Such Files have none of the Integrity, Power, or Speed of Relational databases.}

1。每个文件上的记录id将使您的“数据库”成为一个非关系记录归档系统，使用SQL仅仅是为了方便。这些文件没有关系数据库的完整性、功能或速度。

Andrew Hill:
would you be able to comment further on "Note that there are three Levels of Fragmentation; this deals with Level III only" -- what are the other two levels of fragmentation?

安德鲁·希尔:你是否可以进一步评论“注意有三种层次的碎片化;这只涉及到第三级“——其他两级分裂是什么?

In MS SQL and Sybase ASE, there are three Levels of Fragmentation, and within each Level, several different Types. Keep in mind that when dealing with Fragmentation, we must focus on DataStructures, not on tables (a table is a collection of DataStructures, as explained above). The Levels are:

在MS SQL和Sybase ASE中，有三个级别的分段，每个级别中有几个不同的类型。请记住，在处理碎片时，我们必须关注数据结构，而不是表(如前所述，表是数据结构的集合)。水平:

Level I • Extra-DataStructure
Outside the DataStructure concerned, across or within the database.

级别I•在相关的数据结构之外、跨数据库或在数据库内的额外数据结构。
Level II • DataStructure
Within the DataStructure concerned, above Pages (across all Pages)
This is the Level most frequently addressed by DBAs.

相关数据结构中的第二级•数据结构，上面的页面(跨所有页面)这是dba最常用的级别。
Level III • Page
Within the DataStructure concerned, within the Pages

在相关的数据结构中，在页面中

These links provide full detail re Fragmentation. They are specific to Sybase ASE, however, at the structural level, the information applies to MS SQL.

这些链接提供了详细的重新分段。它们是特定于Sybase ASE的，然而，在结构层，信息适用于MS SQL。

Note that the method I have given is Level II, it corrects the Level II and III Fragmentation.

注意，我给出的方法是第二级，它纠正了第二级和第三级碎片。

#2

You state that you add a clustered index to alleviate the table fragmentation, to then drop it immediately.

您声明您添加了一个聚集索引以减轻表的碎片化，然后立即删除它。

The clustered index removes fragmentation by sorting on the cluster key, but you say that this key would not be possible for future use. This begs the question: why defragment using this key at all?

群集索引通过对群集键进行排序来删除碎片，但是您说这个键将来不可能使用。这就引出了一个问题:为什么要使用这个键进行碎片整理?

It would make sense to create this clustered key and keep it, as you obviously want/need the data sorted that way. You say that data changes would incur data movement penalties that can't be borne; have you thought about creating the index with a lower FILLFACTOR than the default value? Depending upon data change patterns, you could benefit from something as low as 80%. You then have 20% 'unused' space per page, but the benefit of lower page splits when the clustered key values are changed.

创建这个聚集键并保留它是有意义的，因为您显然希望/需要这样排序数据。您说，数据更改将导致无法承受的数据移动惩罚;您是否考虑过使用比默认值更低的FILLFACTOR创建索引?根据数据更改模式，您可以从80%以下的内容中获益。然后，每个页面有20%的“未使用”空间，但是当集群键值发生更改时，页面减少的好处就会被分割。

Could that help you?

可以帮助你吗?

#3

You can maybe compact the heap by running DBCC SHRINKFILE with NOTRUNCATE.

您可以使用NOTRUNCATE运行DBCC SHRINKFILE来压缩堆。

Based on comments, I see you haven't tested with a permenent clustered index.

根据注释，我看到您没有使用permenent群集索引进行测试。

To put this in perspective, we have database with 10 million new rows per day with clustered indexes on all tables. Deleted "gaps" will be removed via scheduled ALTER INDEX (and also forward pointers/page splits).

要正确看待这一点，我们有一个数据库，每天有1000万新行，所有表上都有集群索引。删除的“间隔”将通过预定的ALTER索引(以及正向指针/页面分割)删除。

Your 12GB table may be 2GB after indexing: it merely has 12GB allocated but is massively fragmented too.

您的12GB表可能在索引后2GB:它只分配了12GB，但也大量分散。

#4

I understand your pain in being constrained by the design of a legacy design.

我理解您的痛苦，因为受限于设计的遗产设计。

Have you the oppertunity to restore a backup of the table in question on another server and create a clustered index? It is very possible the clustered index if created on a set of narrow unique columns or an identity column will reduce the total table (data and index) size.

您是否有机会在另一台服务器上恢复问题表的备份并创建集群索引?如果在一组窄的惟一列或标识列上创建集群索引，则很可能会减少整个表(数据和索引)的大小。

In one of my legacy apps all the data was accessed via views. I was able to modify the schema of the underlying table adding an identity column and a clustered index without effecting the application.

在我的一个遗留应用程序中，所有的数据都是通过视图访问的。我能够修改基础表的模式，添加一个标识列和一个聚集索引，而不影响应用程序。

Another drawback of having the heap is the extra IO associated with any fowarded rows.

拥有堆的另一个缺点是与任何已有行关联的额外IO。

I found the article below effective when I was asked if there was any PROOF that we needed a clusted index permanently on the table

当有人问我是否有证据证明我们需要一个永久性的索引时，我发现下面的文章很有效

This article is by Microsoft

这篇文章是微软写的

#5

The problem that no one is talking about is FRAGMENTATION OF THE DATA OR LOG DEVICE FILES ON THE HARD DRIVE(s) ITSELF!! Everyone talks about fragmentation of the indexes and how to avoid/limit that fragmentation.

没有人讨论的问题是硬盘上数据或日志设备文件的碎片化!每个人都在讨论索引的碎片化，以及如何避免/限制碎片化。

FYI: When you create a database, you specify the INITIAL size of the .MDF along with how much it will grow by when it needs to grow. You do the same with the .LDF file. THERE IS NO GUARANTEE THAT WHEN THESE TWO FILES GROW THAT THE DISK SPACE ALLOCATED FOR THE EXTRA DISK SPACE NEEDED WILL BE PHYSICALLY CONTIGUOUS WITH THE EXISTING DISK SPACE ALLOCATED!!

提示:当您创建一个数据库时，您将指定. mdf的初始大小，以及当它需要增长时它将增长多少。您也可以使用. ldf文件。不能保证当这两个文件增长时，分配给所需额外磁盘空间的磁盘空间将与分配的现有磁盘空间在物理上相邻!

Every time one of these two device files needs to expand, there is the possibility of fragmentation of the hard drive disk space. That means the heads on the hard drive need to work harder (and take more time) to move from one section of the hard drive to another section to access the necessary data in the database. It is analogous to buying a small plot of land and building a house that just fits on that land. When you need to expand the house, you have no more land available unless you buy the empty lot next door - except - what if someone else, in the meantime, has already bought that land and built a house on it? Then you CANNOT expand your house. The only possibility is to buy another plot of land in the "neighborhood" and build another house on it. The problem becomes - you and two of your children would live in House A and your wife and third child would live in House B. That would be a pain (as long as you were still married).

每当这两个设备文件中有一个需要扩展时，就有可能导致硬盘磁盘空间的碎片化。这意味着硬盘上的磁头需要更加努力地工作(并花费更多的时间)，以便从硬盘的一个部分移动到另一个部分，以访问数据库中必要的数据。这就好比买一小块地，然后建造一所正好适合那块地的房子。当你需要扩建房子时，除非你买了隔壁的空地——除非——如果同时有人已经买了那块地并在上面盖了房子怎么办?那你就不能扩建你的房子了。唯一的可能是在“社区”里再买一块地，在上面再盖一所房子。问题是，你和你的两个孩子住在A房子里，你的妻子和第三个孩子住在b房子里。

The solution to remedy this situation is to "buy a much larger plot of land, pick up the existing house (i.e. database), move it to the larger plot of land and then expand the house there". Well - how do you do that with a database? Do a full backup, drop the database (unless you have plenty of free disk space to keep both the old fragmented database - just in case - as well as the new database), create a brand new database with plenty of initial disk space allocated (no guarantee that the operating system will insure that the space that you request will be contiguous) and then restore the database into the new database space just created. Yes - it is a pain to do but I do not know of any "automatic disk defragmenter" software that will work on SQL database files.

解决这一问题的办法是“购买更大的地块，购买现有房屋(即数据库)，将其搬到更大的地块，然后在那里扩建房屋”。那么如何使用数据库呢?做一个完整的备份,删除数据库(除非你有足够的空闲磁盘空间保持老分散的数据库——以防以及新的数据库),创建一个新的数据库与大量的初始分配的磁盘空间(不能保证操作系统将确保您请求的空间是连续的),然后将数据库恢复到刚刚创建的新数据库空间。是的——这样做很痛苦，但我不知道有任何“自动磁盘碎片整理程序”软件可以处理SQL数据库文件。

#1

Problem

Let's get some clarity, because this is a common problem, a serious issue for every company using SQL Server.

让我们澄清一下，因为这是一个常见的问题，对于每个使用SQL Server的公司来说都是一个严重的问题。

This problem, and the need for CREATE CLUSTERED INDEX, is misunderstood.

这个问题以及创建集群索引的需要被误解了。

Heap
plus all Non-clustered Indices
plus Text/Image chains

堆加上所有非聚集索引加上文本/图像链
or a Clustered Index
(eliminates the Heap and one Non-clustered Index)
plus all Non-clustered Indices
plus Text/Image chains.

或者集群索引(消除堆和一个非集群索引)加上所有非集群索引以及文本/图像链。

Heaps get badly fragmented; the more interspersed (random)Insert/Deletes/Updates there are, the more fragmentation.

堆严重分散;插入(随机)插入/删除/更新越多，碎片就越多。

There is no way to clean up the Heap, as is. MS does not provide a facility (other vendors do).

没有办法像现在这样清理堆。MS不提供设备(其他供应商提供)。

Solution

当您正在使用时，请务必使用FILLFACTOR，以减少将来的碎片。然后，堆将占用更多分配的空间，以便将来由于更新而进行插入、删除和行扩展。

Note

Note that there are three Levels of Fragmentation; this deals with Level III only, fragmentation within the Heap, which is caused by Lack of a Clustered Index

注意，碎片化有三个层次;这只处理级别III，堆内的碎片，这是由于缺少聚集索引造成的。
As a separate task, at some other time, you may wish to contemplate the implementation of a permanent Clustered Index, which eliminates fragmentation altogether ... but that is separate to the posted problem.

作为一项单独的任务，在其他时候，您可能希望考虑实现一个永久性的集群索引，它可以完全消除碎片……但这与发布的问题是分开的。

Response to Comment

SqlRyan:
While this doesn't give me a magic solution to my problem, it makes pretty clear that my problem is a result of a SQL Server limitation and adding a clustered index is the only way to "defragment" the heap.

SqlRyan:虽然这并没有给我提供一个解决问题的神奇解决方案，但很明显，我的问题是SQL服务器限制的结果，添加集群索引是“碎片整理”堆的唯一方法。

Not quite. I wouldn't call it a "limitation".

不完全是。我不会称之为“限制”。

The method I have given to eliminate the Fragmentation in the Heap is to create a Clustered Index, and then drop it. Ie. temporarily, the only purpose of which is correct the Fragmentation.

我给出的消除堆中碎片的方法是创建集群索引，然后删除它。Ie。暂时而言，其唯一目的是纠正碎片化。
Implementing a Clustered Index on the table (permanently) is a much better solution, because it reduces overall Fragmentation (the DataStructure can still get Fragmented, refer detailed info in links below), which is far less than the Fragmentation that occurs in a Heap.

在表上(永久地)实现集群索引是一种更好的解决方案，因为它减少了整体碎片(数据结构仍然可以得到碎片，请参阅下面链接中的详细信息)，这比堆中出现的碎片要少得多。
- Every table in a Relational database (except "pipe" or "queue" tables) should have a Clustered Index, in order to take advantage of its various benefits.
  
  关系数据库中的每个表(除了“管道”或“队列”表)都应该有一个聚集索引，以便利用它的各种优点。
- The Clustered Index should be on columns that distribute the data (avoiding INSERT conflicts), never be indexed on a monotonically increasing column, such as Record ID ¹, which guarantees an INSERT Hot Spot in the last Page.
  
  聚集索引应该在分布数据的列上(避免插入冲突)，永远不要在单调递增的列上建立索引，比如记录ID 1，它保证在最后一页中插入一个热点。

1。每个文件上的记录id将使您的“数据库”成为一个非关系记录归档系统，使用SQL仅仅是为了方便。这些文件没有关系数据库的完整性、功能或速度。

Andrew Hill:
would you be able to comment further on "Note that there are three Levels of Fragmentation; this deals with Level III only" -- what are the other two levels of fragmentation?

安德鲁·希尔:你是否可以进一步评论“注意有三种层次的碎片化;这只涉及到第三级“——其他两级分裂是什么?

Level I • Extra-DataStructure
Outside the DataStructure concerned, across or within the database.

级别I•在相关的数据结构之外、跨数据库或在数据库内的额外数据结构。
Level II • DataStructure
Within the DataStructure concerned, above Pages (across all Pages)
This is the Level most frequently addressed by DBAs.

相关数据结构中的第二级•数据结构，上面的页面(跨所有页面)这是dba最常用的级别。
Level III • Page
Within the DataStructure concerned, within the Pages

在相关的数据结构中，在页面中

These links provide full detail re Fragmentation. They are specific to Sybase ASE, however, at the structural level, the information applies to MS SQL.

这些链接提供了详细的重新分段。它们是特定于Sybase ASE的，然而，在结构层，信息适用于MS SQL。

Note that the method I have given is Level II, it corrects the Level II and III Fragmentation.

注意，我给出的方法是第二级，它纠正了第二级和第三级碎片。

#2

You state that you add a clustered index to alleviate the table fragmentation, to then drop it immediately.

您声明您添加了一个聚集索引以减轻表的碎片化，然后立即删除它。

The clustered index removes fragmentation by sorting on the cluster key, but you say that this key would not be possible for future use. This begs the question: why defragment using this key at all?

群集索引通过对群集键进行排序来删除碎片，但是您说这个键将来不可能使用。这就引出了一个问题:为什么要使用这个键进行碎片整理?

Could that help you?

可以帮助你吗?

#3

You can maybe compact the heap by running DBCC SHRINKFILE with NOTRUNCATE.

您可以使用NOTRUNCATE运行DBCC SHRINKFILE来压缩堆。

Based on comments, I see you haven't tested with a permenent clustered index.

根据注释，我看到您没有使用permenent群集索引进行测试。

Your 12GB table may be 2GB after indexing: it merely has 12GB allocated but is massively fragmented too.

您的12GB表可能在索引后2GB:它只分配了12GB，但也大量分散。

#4

I understand your pain in being constrained by the design of a legacy design.

我理解您的痛苦，因为受限于设计的遗产设计。

在我的一个遗留应用程序中，所有的数据都是通过视图访问的。我能够修改基础表的模式，添加一个标识列和一个聚集索引，而不影响应用程序。

Another drawback of having the heap is the extra IO associated with any fowarded rows.

拥有堆的另一个缺点是与任何已有行关联的额外IO。

I found the article below effective when I was asked if there was any PROOF that we needed a clusted index permanently on the table

当有人问我是否有证据证明我们需要一个永久性的索引时，我发现下面的文章很有效

This article is by Microsoft

这篇文章是微软写的

秒客网

减少SQL Server表碎片而不添加/删除集群索引?

5 个解决方案

#1

Problem

Solution

Note

Response to Comment

#2

#3

#4

#5

#1

Problem

Solution

Note

Response to Comment

#2

#3

#4

#5

相关文章