外键列上的聚簇索引是否会增加连接性能与非群集?

时间:2020-12-18 02:47:15

In many places it's recommended that clustered indexes are better utilized when used to select range of rows using BETWEEN statement. When I select joining by foreign key field in such a way that this clustered index is used, I guess, that clusterization should help too because range of rows is being selected even though they all have same clustered key value and BETWEEN is not used.

在许多地方,当使用BETWEEN语句用于选择行范围时,建议更好地利用聚簇索引。当我选择使用外键字段以这种聚集索引的方式连接时,我想,该聚类也应该有所帮助,因为即使它们都具有相同的聚簇键值并且未使用BETWEEN,也会选择行范围。

Considering that I care only about that one select with join and nothing else, am I wrong with my guess ?

考虑到我只关心那个选择加入而没有别的,我猜错了吗?

5 个解决方案

#1


7  

Discussing this type of issue in the absolute isn't very useful.

在绝对中讨论这类问题并不是很有用。

It is always a case-by-case situation !

Essentially, access by way of a clustered index saves one indirection, period.

本质上,通过聚簇索引进行访问可以节省一个间接期限。

Assuming the key used in the JOIN, is that of the clustered index, in a single read [whether from an index seek or from a scan or partial scan, doesn't matter], you get the whole row (record).

假设在JOIN中使用的密钥是聚簇索引的密钥,在单个读取中[无论是从索引查找还是从扫描或部分扫描,无关紧要],您将得到整行(记录)。

One problem with clustered indexes, is that you only get one per table. Therefore you need to use it wisely. Indeed in some cases, it is even wiser not to use any clustered index at all because of INSERT overhead and fragmentation (depending on the key and the order of new keys etc.)

聚簇索引的一个问题是,每个表只能获得一个。因此,您需要明智地使用它。实际上在某些情况下,由于INSERT开销和碎片(取决于密钥和新密钥的顺序等),根本不使用任何聚簇索引更为明智。

Sometimes one gets the equivalent benefits of a clustered index, with a covering index, i.e. a index with the desired key(s) sequence, followed by the column values we are interested in. Just like a clustered index, a covering index doesn't require the indirection to the underlying table. Indeed the covering index may be slightly more efficient than the clustered index, because it is smaller.
However, and also, just like clustered indexes, and aside from the storage overhead, there is a performance cost associated with any extra index, during INSERT (and DELETE or UPDATE) queries.

有时,人们可以获得聚集索引的等效优势,使用覆盖索引,即具有所需键序列的索引,然后是我们感兴趣的列值。就像聚簇索引一样,覆盖索引不会要求间接到底层表。实际上,覆盖索引可能比聚集索引稍微更有效,因为它更小。但是,就像聚簇索引一样,除了存储开销之外,在INSERT(和DELETE或UPDATE)查询期间还存在与任何额外索引相关的性能成本。

And, yes, as indicated in other answers, the "foreign-key-ness" of the key used for the clustered index, has absolutely no bearing on the the performance of the index. FKs are constraints aimed at easing the maintenance of the integrity of the database but the underlying fields (columns) are otherwise just like any other field in the table.

并且,是的,正如其他答案中所指出的,用于聚集索引的密钥的“外键密钥”对索引的性能完全没有影响。 FK是旨在简化数据库完整性维护的约束,但是底层字段(列)与表中的任何其他字段一样。

To make wise decisions about index structure, one needs

要做出关于指数结构的明智决策,需要

  • to understands the way the various index types (and the heap) work
    (and, BTW, this varies somewhat between SQL implementations)
  • 理解各种索引类型(和堆)的工作方式(以及BTW,这在SQL实现之间有所不同)
  • to have a good image of the statistical profile of the database(s) at hand:
    which are the big tables, which are the relations, what's the average/maximum cardinality of relation, what's the typical growth rate of the database etc.
  • 掌握数据库统计概况的良好形象:哪些是大表,哪些是关系,什么是关系的平均/最大基数,数据库的典型增长率是什么等等。
  • to have good insight regarding the way the database(s) is (are) going to be be used/queried
  • 了解将要使用/查询数据库的方式

Then and only then, can one can make educated guesses about the interest [or lack thereof] to have a given clustered index.

然后,只有到那时,才能对有兴趣(或缺乏)的人进行有根据的猜测,以获得给定的聚集索引。

#2


2  

An index on the FK column will help the JOIN because the index itself is ordered: clustered just means that the data on disk (leaf) is ordered rather then the B-tree.

FK列上的索引将帮助JOIN,因为索引本身是有序的:clustered只是意味着磁盘(叶子)上的数据是有序的,而不是B树。

If you change it to a covering index, then clustered vs non-clustered is irrelevant. What's important is to have a useful index.

如果将其更改为覆盖索引,则群集与非群集无关。重要的是拥有一个有用的索引。

#3


2  

I would ask something else: would it be wise to put my clustered index on a foreign key column just to speed a single JOIN up? It probably helps, but..... at a price!

我会问其他问题:将聚簇索引放在外键列上只是为了加速单个JOIN是明智的吗?它可能会有所帮助,但.....价格合理!

A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:

对于每个操作,聚簇索引使表更快。是!确实如此。请参阅Kim Tripp的优秀The Clustered Index辩论继续获取背景信息。她还提到了她对聚集索引的主要标准:

  • narrow
  • 狭窄
  • static (never changes)
  • 静态(永不改变)
  • unique
  • 独特
  • if ever possible: ever increasing
  • 如果可能的话:不断增加

INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.

INT IDENTITY完美地实现了这一点 - GUID不会。有关详细背景信息,请参阅GUID作为主键。

Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....

为何缩小?因为聚簇键被添加到同一个表上的每个非聚集索引的每个索引页面(为了能够实际查找数据行,如果需要)。您不希望在群集密钥中使用VARCHAR(200)....

Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!

为什么独特?请参阅上文 - 聚类键是SQL Server用于唯一查找数据行的项和机制。它必须是独一无二的。如果您选择一个非唯一的群集键,SQL Server本身将为您的键添加一个4字节的唯一键。小心那个!

So those are my criteria - put your clustering key on a narrow, stable, unique, hopefully ever-increasing column. If your foreign key column matches those - perfect!

所以这些是我的标准 - 将您的聚类键放在一个狭窄,稳定,独特,希望不断增加的专栏中。如果您的外键列与那些匹配 - 完美!

However, I would not under any circumstances put my clustering key on a wide or even compound foreign key. Remember: the value(s) of the clustering key are being added to each and every non-clustered index entry on that table! If you have 10 non-clustered indices, 100'000 rows in your table - that's one million entries. It makes a huge difference whether that's a 4-byte integer, or a 200-byte VARCHAR - HUGE. And not just on disk - in server memory as well. Think very very carefully about what to make your clustered index!

但是,在任何情况下,我都不会将我的聚类键放在一个宽的甚至复合的外键上。请记住:群集密钥的值正被添加到该表上的每个非聚集索引条目!如果您有10个非聚集索引,表中有100'000行 - 这是一百万个条目。无论是4字节整数还是200字节VARCHAR - HUGE,它都会产生巨大的差异。而不只是在磁盘上 - 也在服务器内存中。仔细考虑如何制作聚簇索引!

SQL Server might need to add a uniquifier - making things even worse. If the values will ever change, SQL Server would have to do a lot of bookkeeping and updating all over the place.

SQL Server可能需要添加一个uniquifier - 使事情变得更糟。如果值将发生变化,SQL Server将不得不进行大量的簿记和更新。

So in short:

简而言之:

  • putting an index on your foreign keys is definitely a great idea - do it all the time!
  • 在你的外键上放一个索引绝对是一个好主意 - 一直这样做!
  • I would be very very careful about making that a clustered index. First of all, you only get one clustered index, so which FK relationship are you going to pick? And don't put the clustering key on a wide and constantly changing column
  • 我会非常小心地将其作为聚集索引。首先,你只得到一个聚集索引,那么你要选择哪种FK关系?并且不要将聚类键放在宽且不断变化的列上

#4


1  

It depends on the database implementation.

这取决于数据库的实现。

For SQL Server, a clustered index is a data structure where the data is stored as pages and there are B-Trees and are stored as a separate data structure. The reason you get fast performance, is that you can get to the start of the chain quickly and ranges are an easy linked list to follow.

对于SQL Server,聚簇索引是一种数据结构,其中数据存储为页面,并且存在B树并作为单独的数据结构存储。您获得快速性能的原因是,您可以快速到达链的起点,范围是一个易于遵循的链接列表。

Non-Clustered indexes is a data structure that contains pointers to the actual records and as such different concerns.

非聚集索引是一种数据结构,其中包含指向实际记录的指针以及不同的关注点。

Refer to the documentation regarding Clustered Index Structures.

请参阅有关聚集索引结构的文档。

An index will not help in relation to a Foreign Key relationship, but it will help due to the concept of "covered" index. If your WHERE clause contains a constraint based upon the index. it will be able to generate the returned data set faster. That is where the performance comes from.

索引与外键关系无关,但由于“覆盖”索引的概念,它将有所帮助。如果WHERE子句包含基于索引的约束。它将能够更快地生成返回的数据集。这就是性能的来源。

#5


0  

The performance gains usually come if you are selecting data sequentially within the cluster. Also, it depends entirely on the size of the table (data) and the conditions in your between statement.

如果要在群集中按顺序选择数据,通常会获得性能提升。此外,它完全取决于表(数据)的大小和中间语句中的条件。

#1


7  

Discussing this type of issue in the absolute isn't very useful.

在绝对中讨论这类问题并不是很有用。

It is always a case-by-case situation !

Essentially, access by way of a clustered index saves one indirection, period.

本质上,通过聚簇索引进行访问可以节省一个间接期限。

Assuming the key used in the JOIN, is that of the clustered index, in a single read [whether from an index seek or from a scan or partial scan, doesn't matter], you get the whole row (record).

假设在JOIN中使用的密钥是聚簇索引的密钥,在单个读取中[无论是从索引查找还是从扫描或部分扫描,无关紧要],您将得到整行(记录)。

One problem with clustered indexes, is that you only get one per table. Therefore you need to use it wisely. Indeed in some cases, it is even wiser not to use any clustered index at all because of INSERT overhead and fragmentation (depending on the key and the order of new keys etc.)

聚簇索引的一个问题是,每个表只能获得一个。因此,您需要明智地使用它。实际上在某些情况下,由于INSERT开销和碎片(取决于密钥和新密钥的顺序等),根本不使用任何聚簇索引更为明智。

Sometimes one gets the equivalent benefits of a clustered index, with a covering index, i.e. a index with the desired key(s) sequence, followed by the column values we are interested in. Just like a clustered index, a covering index doesn't require the indirection to the underlying table. Indeed the covering index may be slightly more efficient than the clustered index, because it is smaller.
However, and also, just like clustered indexes, and aside from the storage overhead, there is a performance cost associated with any extra index, during INSERT (and DELETE or UPDATE) queries.

有时,人们可以获得聚集索引的等效优势,使用覆盖索引,即具有所需键序列的索引,然后是我们感兴趣的列值。就像聚簇索引一样,覆盖索引不会要求间接到底层表。实际上,覆盖索引可能比聚集索引稍微更有效,因为它更小。但是,就像聚簇索引一样,除了存储开销之外,在INSERT(和DELETE或UPDATE)查询期间还存在与任何额外索引相关的性能成本。

And, yes, as indicated in other answers, the "foreign-key-ness" of the key used for the clustered index, has absolutely no bearing on the the performance of the index. FKs are constraints aimed at easing the maintenance of the integrity of the database but the underlying fields (columns) are otherwise just like any other field in the table.

并且,是的,正如其他答案中所指出的,用于聚集索引的密钥的“外键密钥”对索引的性能完全没有影响。 FK是旨在简化数据库完整性维护的约束,但是底层字段(列)与表中的任何其他字段一样。

To make wise decisions about index structure, one needs

要做出关于指数结构的明智决策,需要

  • to understands the way the various index types (and the heap) work
    (and, BTW, this varies somewhat between SQL implementations)
  • 理解各种索引类型(和堆)的工作方式(以及BTW,这在SQL实现之间有所不同)
  • to have a good image of the statistical profile of the database(s) at hand:
    which are the big tables, which are the relations, what's the average/maximum cardinality of relation, what's the typical growth rate of the database etc.
  • 掌握数据库统计概况的良好形象:哪些是大表,哪些是关系,什么是关系的平均/最大基数,数据库的典型增长率是什么等等。
  • to have good insight regarding the way the database(s) is (are) going to be be used/queried
  • 了解将要使用/查询数据库的方式

Then and only then, can one can make educated guesses about the interest [or lack thereof] to have a given clustered index.

然后,只有到那时,才能对有兴趣(或缺乏)的人进行有根据的猜测,以获得给定的聚集索引。

#2


2  

An index on the FK column will help the JOIN because the index itself is ordered: clustered just means that the data on disk (leaf) is ordered rather then the B-tree.

FK列上的索引将帮助JOIN,因为索引本身是有序的:clustered只是意味着磁盘(叶子)上的数据是有序的,而不是B树。

If you change it to a covering index, then clustered vs non-clustered is irrelevant. What's important is to have a useful index.

如果将其更改为覆盖索引,则群集与非群集无关。重要的是拥有一个有用的索引。

#3


2  

I would ask something else: would it be wise to put my clustered index on a foreign key column just to speed a single JOIN up? It probably helps, but..... at a price!

我会问其他问题:将聚簇索引放在外键列上只是为了加速单个JOIN是明智的吗?它可能会有所帮助,但.....价格合理!

A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:

对于每个操作,聚簇索引使表更快。是!确实如此。请参阅Kim Tripp的优秀The Clustered Index辩论继续获取背景信息。她还提到了她对聚集索引的主要标准:

  • narrow
  • 狭窄
  • static (never changes)
  • 静态(永不改变)
  • unique
  • 独特
  • if ever possible: ever increasing
  • 如果可能的话:不断增加

INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.

INT IDENTITY完美地实现了这一点 - GUID不会。有关详细背景信息,请参阅GUID作为主键。

Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....

为何缩小?因为聚簇键被添加到同一个表上的每个非聚集索引的每个索引页面(为了能够实际查找数据行,如果需要)。您不希望在群集密钥中使用VARCHAR(200)....

Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!

为什么独特?请参阅上文 - 聚类键是SQL Server用于唯一查找数据行的项和机制。它必须是独一无二的。如果您选择一个非唯一的群集键,SQL Server本身将为您的键添加一个4字节的唯一键。小心那个!

So those are my criteria - put your clustering key on a narrow, stable, unique, hopefully ever-increasing column. If your foreign key column matches those - perfect!

所以这些是我的标准 - 将您的聚类键放在一个狭窄,稳定,独特,希望不断增加的专栏中。如果您的外键列与那些匹配 - 完美!

However, I would not under any circumstances put my clustering key on a wide or even compound foreign key. Remember: the value(s) of the clustering key are being added to each and every non-clustered index entry on that table! If you have 10 non-clustered indices, 100'000 rows in your table - that's one million entries. It makes a huge difference whether that's a 4-byte integer, or a 200-byte VARCHAR - HUGE. And not just on disk - in server memory as well. Think very very carefully about what to make your clustered index!

但是,在任何情况下,我都不会将我的聚类键放在一个宽的甚至复合的外键上。请记住:群集密钥的值正被添加到该表上的每个非聚集索引条目!如果您有10个非聚集索引,表中有100'000行 - 这是一百万个条目。无论是4字节整数还是200字节VARCHAR - HUGE,它都会产生巨大的差异。而不只是在磁盘上 - 也在服务器内存中。仔细考虑如何制作聚簇索引!

SQL Server might need to add a uniquifier - making things even worse. If the values will ever change, SQL Server would have to do a lot of bookkeeping and updating all over the place.

SQL Server可能需要添加一个uniquifier - 使事情变得更糟。如果值将发生变化,SQL Server将不得不进行大量的簿记和更新。

So in short:

简而言之:

  • putting an index on your foreign keys is definitely a great idea - do it all the time!
  • 在你的外键上放一个索引绝对是一个好主意 - 一直这样做!
  • I would be very very careful about making that a clustered index. First of all, you only get one clustered index, so which FK relationship are you going to pick? And don't put the clustering key on a wide and constantly changing column
  • 我会非常小心地将其作为聚集索引。首先,你只得到一个聚集索引,那么你要选择哪种FK关系?并且不要将聚类键放在宽且不断变化的列上

#4


1  

It depends on the database implementation.

这取决于数据库的实现。

For SQL Server, a clustered index is a data structure where the data is stored as pages and there are B-Trees and are stored as a separate data structure. The reason you get fast performance, is that you can get to the start of the chain quickly and ranges are an easy linked list to follow.

对于SQL Server,聚簇索引是一种数据结构,其中数据存储为页面,并且存在B树并作为单独的数据结构存储。您获得快速性能的原因是,您可以快速到达链的起点,范围是一个易于遵循的链接列表。

Non-Clustered indexes is a data structure that contains pointers to the actual records and as such different concerns.

非聚集索引是一种数据结构,其中包含指向实际记录的指针以及不同的关注点。

Refer to the documentation regarding Clustered Index Structures.

请参阅有关聚集索引结构的文档。

An index will not help in relation to a Foreign Key relationship, but it will help due to the concept of "covered" index. If your WHERE clause contains a constraint based upon the index. it will be able to generate the returned data set faster. That is where the performance comes from.

索引与外键关系无关,但由于“覆盖”索引的概念,它将有所帮助。如果WHERE子句包含基于索引的约束。它将能够更快地生成返回的数据集。这就是性能的来源。

#5


0  

The performance gains usually come if you are selecting data sequentially within the cluster. Also, it depends entirely on the size of the table (data) and the conditions in your between statement.

如果要在群集中按顺序选择数据,通常会获得性能提升。此外,它完全取决于表(数据)的大小和中间语句中的条件。