具有群集GUID PK的SQL Server数据库 - 切换聚簇索引还是切换到顺序(梳状)GUID?

时间:2021-01-15 02:46:33

We have a database in which all the PKs are GUIDs, and most of the PKs are also the clustered index for the table. We know that this is bad (due to the random nature of GUIDs). So, it seems there are basically two options here (short of throwing out GUIDs as PKs altogether, which we cannot do (at least not at this time)).

我们有一个数据库,其中所有PK都是GUID,大多数PK也是表的聚簇索引。我们知道这很糟糕(由于GUID的随机性)。因此,似乎这里基本上有两个选项(完全没有把GUID作为PK扔掉,这是我们做不到的(至少现在不行))。

  • We could change the GUID generation algorithm to e.g. the one that NHibernate uses, as detailed in this post, or
  • 我们可以将GUID生成算法更改为例如NHibernate使用的那个,详见本文,或

  • we could, for the tables that are under the heaviest use, change to a different clustered index, e.g. an IDENTITY column, and keep the "random" GUIDs as PKs.
  • 对于最重要使用的表,我们可以更改为不同的聚簇索引,例如IDENTITY列,并将“随机”GUID保留为PK。

Is it possible to give any general recommendations in such a scenario?

是否有可能在这种情况下提供任何一般性建议?

The application in question has 500+ tables, the largest one presently at about 1,5 million rows, a few tables around 500 000 rows, and the rest significantly lower (most of them well below 10K).

该应用程序有500多个表,最大的一个目前约150万行,几个表约50万行,其余表显着较低(大多数低于10K)。

Furthermore, the application is installed at several customer sites already, so we have to take any possible negative effects for existing customer into consideration.

此外,该应用程序已安装在多个客户站点,因此我们必须考虑现有客户的任何可能的负面影响。

Thanks!

2 个解决方案

#1


3  

If you are able to change your guid generation to a sequential guid generation easily then that is probably your quick win option. The sequential guid will stop the fragmentation on the table whilst remaining as your clustered index. The major downside with a sequential guid though is that they then become guessable which is often not desired and the reason guids are used in the first place.

如果您能够轻松地将guid生成更改为顺序guid生成,那么这可能是您的快速获胜选项。顺序guid将停止表上的碎片,同时保留为聚簇索引。顺序guid的主要缺点是它们随后变得可猜测,这通常是不可取的,并且首先使用guid的原因。

If you go down the Identity route for your clustered primary key and then just an index on your guid column then you will still get a lot of fragmentation on your guid index. However the fact that the table will no longer get fragmented will be a massive gain.

如果您沿着集群主键的Identity路由,然后只是guid列的索引,那么您的guid索引仍然会有很多碎片。然而,表格将不再分散的事实将是一个巨大的收获。

Finally though, I know you said you can't do this for now, but, if you don't NEED to use guids as an index at all then you remove all of these problems.

最后,我知道你说你现在不能这样做,但是,如果你根本不需要使用guids作为索引,那么你就删除了所有这些问题。

#2


7  

My opinion is clear: use an INT IDENTITY for your clustering key. That's by far the best, most optimal clustering key, because its:

我的观点很明确:对集群密钥使用INT IDENTITY。这是迄今为止最好,最优的群集密钥,因为它:

  • small
  • stable (should never change)
  • 稳定(永远不要改变)

  • unique
  • ever increasing

Sequential GUID's are definitely a lot better than regular random GUIDs, but there's still four times larger than an INT (16 vs 4 byte) and this will be a factor if you have lots of rows in your table, and lots of non-clustered indices on that table, too. The clustering key is being added to each and every non-clustered index, so that significantly increases the negative effect of having 16 vs 4 bytes in size. More bytes means more pages on disk and in SQL Server RAM and thus more disk I/O and more work for SQL Server.

顺序GUID肯定比常规随机GUID好很多,但是仍然比INT(16 vs 4字节)大四倍,如果你的表中有很多行,这将是一个因素,以及许多非聚集索引在那张桌子上也是。聚簇键被添加到每个非聚集索引中,因此显着增加了16个大小与4个字节的负面影响。更多字节意味着磁盘和SQL Server RAM中的页面越多,因此更多的磁盘I / O和更多的SQL Server工作。

You can definitely keep the GUID as the primary key, where appropriate - but in that case, I'd strongly recommend adding a separate INT IDENTITY to that table and make that INT the clustering key. I've done that myself with a number of large tables, and the results are astonishing - the table fragmentation is down from 99 and more percent down to a few percent, and performance is much better.

在适当的情况下,您绝对可以将GUID保留为主键 - 但在这种情况下,我强烈建议为该表添加单独的INT IDENTITY并使该INT成为群集密钥。我自己已经完成了许多大型表格,结果令人惊讶 - 表格碎片率从99%降低到百分之几,性能更好。

Check out Kimberly Tripp's excellent series on why GUID's are bad as clustering keys in SQL Server here:

查看Kimberly Tripp关于为什么GUID在SQL Server中作为群集密钥不好的优秀系列:

Marc

#1


3  

If you are able to change your guid generation to a sequential guid generation easily then that is probably your quick win option. The sequential guid will stop the fragmentation on the table whilst remaining as your clustered index. The major downside with a sequential guid though is that they then become guessable which is often not desired and the reason guids are used in the first place.

如果您能够轻松地将guid生成更改为顺序guid生成,那么这可能是您的快速获胜选项。顺序guid将停止表上的碎片,同时保留为聚簇索引。顺序guid的主要缺点是它们随后变得可猜测,这通常是不可取的,并且首先使用guid的原因。

If you go down the Identity route for your clustered primary key and then just an index on your guid column then you will still get a lot of fragmentation on your guid index. However the fact that the table will no longer get fragmented will be a massive gain.

如果您沿着集群主键的Identity路由,然后只是guid列的索引,那么您的guid索引仍然会有很多碎片。然而,表格将不再分散的事实将是一个巨大的收获。

Finally though, I know you said you can't do this for now, but, if you don't NEED to use guids as an index at all then you remove all of these problems.

最后,我知道你说你现在不能这样做,但是,如果你根本不需要使用guids作为索引,那么你就删除了所有这些问题。

#2


7  

My opinion is clear: use an INT IDENTITY for your clustering key. That's by far the best, most optimal clustering key, because its:

我的观点很明确:对集群密钥使用INT IDENTITY。这是迄今为止最好,最优的群集密钥,因为它:

  • small
  • stable (should never change)
  • 稳定(永远不要改变)

  • unique
  • ever increasing

Sequential GUID's are definitely a lot better than regular random GUIDs, but there's still four times larger than an INT (16 vs 4 byte) and this will be a factor if you have lots of rows in your table, and lots of non-clustered indices on that table, too. The clustering key is being added to each and every non-clustered index, so that significantly increases the negative effect of having 16 vs 4 bytes in size. More bytes means more pages on disk and in SQL Server RAM and thus more disk I/O and more work for SQL Server.

顺序GUID肯定比常规随机GUID好很多,但是仍然比INT(16 vs 4字节)大四倍,如果你的表中有很多行,这将是一个因素,以及许多非聚集索引在那张桌子上也是。聚簇键被添加到每个非聚集索引中,因此显着增加了16个大小与4个字节的负面影响。更多字节意味着磁盘和SQL Server RAM中的页面越多,因此更多的磁盘I / O和更多的SQL Server工作。

You can definitely keep the GUID as the primary key, where appropriate - but in that case, I'd strongly recommend adding a separate INT IDENTITY to that table and make that INT the clustering key. I've done that myself with a number of large tables, and the results are astonishing - the table fragmentation is down from 99 and more percent down to a few percent, and performance is much better.

在适当的情况下,您绝对可以将GUID保留为主键 - 但在这种情况下,我强烈建议为该表添加单独的INT IDENTITY并使该INT成为群集密钥。我自己已经完成了许多大型表格,结果令人惊讶 - 表格碎片率从99%降低到百分之几,性能更好。

Check out Kimberly Tripp's excellent series on why GUID's are bad as clustering keys in SQL Server here:

查看Kimberly Tripp关于为什么GUID在SQL Server中作为群集密钥不好的优秀系列:

Marc