复合集群PK行为vs非集群PK +非唯一集群索引

时间:2022-03-26 02:49:23

I have a table with a few columns, the 2 important ones are appid and fileid. Together, they make up a PK for the table. A typical use case for the table will be how many files contain appid x, or which appid is the most popular. Those queries will be run very often on only subsets of files as well, instead of all files. Neither column is unique individually.

我有一个有几个列的表,两个重要的是appid和fileid。他们一起组成了一张桌子的PK。表的典型用例是包含appid x的文件数量,或者appid是最受欢迎的。这些查询通常只在文件的子集上运行,而不是在所有文件上。这两列都不是唯一的。

Based on that, I feel like the best choice for a clustered index would be AppId. However, since setting both columns as a PK will make an extra nonclustered index, and appids lack of uniqueness (there will be lots of repeats) means it will need a uniquifier column behind the scenes anyway, would it make more sense to just say the PK is clustered and not specify another clustered index? Assuming I specified AppId first in the PK, would it treat diagnosticfileid like a uniquifier behind the scenes and give me the optimal performance that way?

基于此,我认为集群索引的最佳选择应该是AppId。然而,由于设置这两列PK将额外的非聚集索引,和appids缺乏独特性(会有很多重复的)意味着它需要一个uniquifier列在幕后,会更有意义只是说PK是集群而不是指定另一个聚集索引?假设我在PK中首先指定了AppId,那么它会像在后台处理一个uniquifier那样处理diagnostics fileid并以这种方式给出最优性能吗?

EDIT: An important thing i forgot to originally mention is that APPId's won't be steadily increasing or anything, so there will be insertions to the middle of the table. I was thinking I could prevent some problems with this by using a fillfactor, but the table will get pretty big, so I dont know how much that will help.

编辑:我忘了一件重要的事情,那就是APPId的值不会稳定地增加或增加,所以会在表的中间插入。我想我可以通过使用fillfactor来避免一些问题,但是这个表格会变得很大,所以我不知道这会有多大帮助。

Also, it is going to be inserted into pretty often, but never large chunks at a time. Probably something like a few thousand rows an hour. There isn't really any value that will reliably increase and be a good choice for a clustered index in that respect, but I wasn't sure how big of a deal that is. I could add an id just to have a good value to cluster around, but I feel like that'd slow down selects a lot.

而且,它将被插入到相当频繁的情况中,但每次都不会是大块的。大概是每小时几千行。在这方面,没有任何价值能够可靠地增加并成为聚集索引的好选择,但我不确定这有多大。我可以添加一个id,仅仅是为了使集群具有良好的值,但是我觉得这会减慢选择的速度。

2 个解决方案

#1


3  

If your two most popular queries are "how many files contain appId" and "which appId is most popular", you should make this indexed view:

如果您最常用的两个查询是“有多少文件包含appId”和“哪个appId最受欢迎”,您应该将这个索引视图设置为:

CREATE VIEW
        v_appCount
WITH SCHEMABINDING
AS
        SELECT  appId, COUNT_BIG(*) AS cnt
        FROM    dbo.mytable
        GROUP BY
                appId
GO

CREATE UNIQUE CLUSTERED INDEX
        ux_v_appCount_appId
ON      v_appCount (appId)

This way you could run those queries:

通过这种方式,您可以运行这些查询:

SELECT  cnt
FROM    v_appCount
WHERE   appId = @myAppId

and

SELECT  TOP 100
        *
FROM    v_appCount va
ORDER BY
        appId DESC

almost instantly.

几乎立即。

#2


1  

The problem with compound PKs comes if they are clustered, because an insert in the middle of the table causes a physical reordering of the contents. If the table is not expected to reach ginormous sizes, then it probably won't matter, but it is definitely something to consider. I should add that if this is a high select table and a low insert table, then that also limits the impact of inserts in the middle of the primary key. You could definitely make it a non-clustered primary key, but that has select performance considerations.

如果复合PKs是集群的,那么就会出现问题,因为在表中间插入会导致内容的物理重新排序。如果这张桌子预计不会达到巨大的尺寸,那么它可能并不重要,但它肯定是需要考虑的东西。我应该补充一点,如果这是一个高选择表和一个低插入表,那么这也限制了插入在主键中间的影响。您当然可以将它设置为非集群主键,但这需要考虑性能的选择。

EDIT
Considering your edit, I would recommend you do an auto incrementing PK (that is nonclustered) and create a unique constraint (which also creates a unique, non-clustered index). Basically, I wouldn't recommend a clustered index on this table. I don't think you'll see much performance difference without it, but you would if it were there and you did thousands of inserts in the middle of a table. Deadlocks will haunt you.

考虑到您的编辑,我建议您进行一个自动递增的PK(非集群化的)并创建一个唯一的约束(它也创建一个唯一的、非集群化的索引)。基本上,我不建议在这个表上使用集群索引。如果没有它,我不认为会有多大的性能差异,但是如果它在那里,并且您在表的中间做了数千次插入,您就会看到性能差异。死锁会困扰你。

Take a quick read at this article. While it is old, the principles still apply.

快速阅读这篇文章。虽然这些原则已经过时,但仍然适用。

#1


3  

If your two most popular queries are "how many files contain appId" and "which appId is most popular", you should make this indexed view:

如果您最常用的两个查询是“有多少文件包含appId”和“哪个appId最受欢迎”,您应该将这个索引视图设置为:

CREATE VIEW
        v_appCount
WITH SCHEMABINDING
AS
        SELECT  appId, COUNT_BIG(*) AS cnt
        FROM    dbo.mytable
        GROUP BY
                appId
GO

CREATE UNIQUE CLUSTERED INDEX
        ux_v_appCount_appId
ON      v_appCount (appId)

This way you could run those queries:

通过这种方式,您可以运行这些查询:

SELECT  cnt
FROM    v_appCount
WHERE   appId = @myAppId

and

SELECT  TOP 100
        *
FROM    v_appCount va
ORDER BY
        appId DESC

almost instantly.

几乎立即。

#2


1  

The problem with compound PKs comes if they are clustered, because an insert in the middle of the table causes a physical reordering of the contents. If the table is not expected to reach ginormous sizes, then it probably won't matter, but it is definitely something to consider. I should add that if this is a high select table and a low insert table, then that also limits the impact of inserts in the middle of the primary key. You could definitely make it a non-clustered primary key, but that has select performance considerations.

如果复合PKs是集群的,那么就会出现问题,因为在表中间插入会导致内容的物理重新排序。如果这张桌子预计不会达到巨大的尺寸,那么它可能并不重要,但它肯定是需要考虑的东西。我应该补充一点,如果这是一个高选择表和一个低插入表,那么这也限制了插入在主键中间的影响。您当然可以将它设置为非集群主键,但这需要考虑性能的选择。

EDIT
Considering your edit, I would recommend you do an auto incrementing PK (that is nonclustered) and create a unique constraint (which also creates a unique, non-clustered index). Basically, I wouldn't recommend a clustered index on this table. I don't think you'll see much performance difference without it, but you would if it were there and you did thousands of inserts in the middle of a table. Deadlocks will haunt you.

考虑到您的编辑,我建议您进行一个自动递增的PK(非集群化的)并创建一个唯一的约束(它也创建一个唯一的、非集群化的索引)。基本上,我不建议在这个表上使用集群索引。如果没有它,我不认为会有多大的性能差异,但是如果它在那里,并且您在表的中间做了数千次插入,您就会看到性能差异。死锁会困扰你。

Take a quick read at this article. While it is old, the principles still apply.

快速阅读这篇文章。虽然这些原则已经过时,但仍然适用。