I've inherited some database creation scripts for a SQL SERVER 2005 database.
我为SQL SERVER 2005数据库继承了一些数据库创建脚本。
One thing I've noticed is that all primary keys are created as NON CLUSTERED
indexes as opposed to clustered.
我注意到的一件事是,所有主键都创建为非集群索引,而不是集群。
I know that you can only have one clustered index per table and that you may want to have it on a non primary key column for query performance of searches etc. However there are no other CLUSTERED
indexes on the tables in questions.
我知道每个表只能有一个聚簇索引,并且您可能希望将它放在非主键列上以便查询搜索性能等。但是问题中的表中没有其他CLUSTERED索引。
So my question is are there any technical reasons not to have clustered indexes on a primary key column apart from the above.
所以我的问题是,除了上述内容之外,是否有任何技术原因不在主键列上使用聚簇索引。
5 个解决方案
#1
8
On any "normal" data or lookup table: no, I don't see any reason whatsoever.
在任何“正常”数据或查找表上:不,我没有看到任何理由。
On stuff like bulk import tables, or temporary tables - it depends.
在诸如批量导入表或临时表之类的东西上 - 这取决于。
To some people surprisingly, it appears that having a good clustered index actually can speed up operations like INSERT or UPDATE. See Kimberly Tripps excellent The Clustered Index Debate continues.... blog post in which she explains in great detail why this is the case.
令人惊讶的是,看起来拥有良好的聚簇索引实际上可以加速INSERT或UPDATE等操作。看到金伯利·特里普斯出色的聚集指数辩论继续....博客文章中,她详细解释了为什么会这样。
In this light: I don't see any valid reason not to have a good clustered index (narrow, stable, unique, ever-increasing = INT IDENTITY
as the most obvious choice) on any SQL Server table.
有鉴于此:我认为没有任何正当理由不在任何SQL Server表上拥有良好的聚簇索引(窄,稳定,唯一,不断增加= INT IDENTITY作为最明显的选择)。
To get some deep insights into how and why to choose clustering keys, read all of Kimberly Tripp's excellent blog posts on the topic:
要深入了解如何以及为何选择群集密钥,请阅读Kimberly Tripp关于该主题的所有优秀博文:
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustering-Key.aspx
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustering-Key.aspx
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustered-Index.aspx
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustered-Index.aspx
Excellent stuff from the "Queen of Indexing" ! :-)
来自“索引女王”的优秀作品! :-)
#2
6
Clustered Tables vs Heap Tables
集群表与堆表
(Good article on subject at www.mssqltips.com)
(关于主题的好文章,请访问www.mssqltips.com)
HEAP Table (Without clustered index)
HEAP表(没有聚簇索引)
-
Data is not stored in any particular order
数据不以任何特定顺序存储
-
Specific data can not be retrieved quickly, unless there are also non-clustered indexes
除非还存在非聚集索引,否则无法快速检索特定数据
-
Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages
数据页未链接,因此顺序访问需要返回索引分配映射(IAM)页面
-
Since there is no clustered index, additional time is not needed to maintain the index
由于没有聚簇索引,因此不需要额外的时间来维护索引
-
Since there is no clustered index, there is not the need for additional space to store the clustered index tree
由于没有聚簇索引,因此不需要额外的空间来存储聚簇索引树
-
These tables have a index_id value of 0 in the sys.indexes catalog view
这些表在sys.indexes目录视图中的index_id值为0
Clustered Table
集群表
-
Data is stored in order based on the clustered index key
数据基于聚簇索引键按顺序存储
-
Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns
如果查询使用索引列,则可以基于聚簇索引键快速检索数据
-
Data pages are linked for faster sequential access Additional time is needed to maintain clustered index based on INSERTS, UPDATES and DELETES
链接数据页以加快顺序访问需要额外的时间来维护基于INSERTS,UPDATES和DELETES的聚簇索引
-
Additional space is needed to store clustered index tree These tables have a index_id value of 1 in the sys.indexes catalog view
存储聚簇索引树需要额外的空间这些表在sys.indexes目录视图中的index_id值为1
#3
1
Please read my answer under "No direct access to data row in clustered table - why?", first. Specifically item [2] Caveat.
请首先阅读我的回答“无法直接访问群集表中的数据行 - 为什么?”。具体项目[2]警告。
The people who created the "database" are cretins. They had:
创建“数据库”的人是cretins。他们有:
- a bunch of unnormalised spreadhseets, not normalised relational tables
- 一堆非规范化的spreadhseets,而不是规范化的关系表
- the PKs are all IDENTITY columns (the spreadsheets are linked to each other; they have to be navigated one-by-one-by-one); there is no relational access or relational power across the database
- PK都是IDENTITY列(电子表格彼此链接;它们必须逐个导航);数据库中没有关系访问或关系功能
- they had PRIMARY KEY, which produce UNIQUE CLUSTERED
- 他们有PRIMARY KEY,它产生了独特的聚集
- they found that that prevented concurrency
- 他们发现这阻止了并发性
- they removed the CI and made them all NCIs
- 他们删除了CI并使他们成为所有NCI
- they were too lazy to finish the reversal; to nominate an alternate (current NCI) to become the new CI, for each table
- 他们懒得完成逆转;为每个表提名一个备用(当前NCI)成为新的CI
- the IDENTITY column remains the Primary Key (it isn't really, but it is in this hamfisted implementation)
- IDENTITY列仍然是主键(它不是真的,但它在这个hamfisted实现中)
For such collections of spreadsheets masquerading as databases, it is becoming more and more common to avoid CIs altogether, and just have NCIs plus the Heap. Obviously they get none of the power or benefits of the CI, but hell, they get none of the power or benefit of Relational databases, so who cares that they get none of the power of CIs (which were designed for Relational databases, which theirs is not). The way they look at it, they have to "refactor" the darn thing every so often anyway, so why bother. Relational databases do not need "refactoring".
对于伪装成数据库的电子表格集合,完全避免CI会变得越来越普遍,并且只有NCI加上堆。显然他们没有得到CI的权力或好处,但是,他们没有得到关系数据库的权力或利益,所以谁关心他们没有得到CI的功能(这是为关系数据库设计的,他们的不是)。他们看待它的方式,他们不得不经常“重构”这个糟糕的事情,所以为什么要这么麻烦。关系数据库不需要“重构”。
If you need to discuss this response further, please post the CREATE TABLE/INDEX DDL; otherwise it is a time-wasting academic argument.
如果您需要进一步讨论此响应,请发布CREATE TABLE / INDEX DDL;否则这是浪费时间的学术争论。
#4
0
Here is another (have it already been provided in other answers?) possible reason (still to be understood):
这是另一个(已经在其他答案中提供过吗?)可能的原因(仍有待理解):
- SQL Server - Poor performance of PK delete
- SQL Server - PK删除性能不佳
I hope, I shall update later but for now it is rather the desire to link these topics
我希望,我稍后会更新,但现在更愿意将这些主题联系起来
Update:
What do I miss in understanding the clustered index?
更新:在理解聚簇索引时我会错过什么?
#5
0
With some b-tree servers/programming languages still used today, fixed or variable length flat ascii files are used for storing data. When a new data record/row is added to a file (table), the record is (1) appended to the end of the file (or replaces a deleted record) and (2) the indexes are balanced. When data is stored this way, you don't have to be concerned about system performance (as far as what the b-tree server is doing to return a pointer to the first data record). The response time is only effected by the # of nodes in your index files.
对于今天仍在使用的一些b树服务器/编程语言,固定或可变长度的平面ascii文件用于存储数据。将新数据记录/行添加到文件(表)时,记录将(1)附加到文件末尾(或替换已删除的记录)和(2)索引是平衡的。以这种方式存储数据时,您不必担心系统性能(就b-tree服务器执行的操作而言,返回指向第一个数据记录的指针)。响应时间仅受索引文件中的节点数的影响。
When you get into using SQL, you hopefully come to realize that system performance has to be considered whenever you write an SQL statement. Using an "ORDER BY" statement on a non-indexed column can bring a system to its knees. Using a clustered index might put an unnecessary load on the CPU. It's the 21st century and I wish we didn't have to think about system performance when programming in SQL, but we still do.
当您开始使用SQL时,您希望每当编写SQL语句时都必须考虑系统性能。在非索引列上使用“ORDER BY”语句可以使系统瘫痪。使用聚簇索引可能会给CPU带来不必要的负载。这是21世纪,我希望在SQL编程时我们不必考虑系统性能,但我们仍然这样做。
With some older programming languages, it was mandatory to use an index whenever sorted data is retrieved. I only wish this requirement was still in place today. I can only wonder how many companies have updated their slow computer systems due to a poorly written SQL statement on non-indexed data.
对于一些较旧的编程语言,每当检索到排序数据时都必须使用索引。我只希望今天这个要求仍然存在。我只能想知道有多少公司更新了他们的慢速计算机系统,因为在非索引数据上编写的SQL语句写得不好。
In my 25 years of programming, I've never needed my physical data stored in a particular order, so maybe that is why some programmers avoid using clustered indexes. It's hard to know what the tradeoff is (storage time, verses retrieval time) especially if the system you are designing might store millions of records someday.
在我25年的编程中,我从未需要以特定顺序存储我的物理数据,因此这可能是一些程序员避免使用聚簇索引的原因。很难知道权衡取舍(存储时间,检索时间),特别是如果您正在设计的系统有一天可能存储数百万条记录。
#1
8
On any "normal" data or lookup table: no, I don't see any reason whatsoever.
在任何“正常”数据或查找表上:不,我没有看到任何理由。
On stuff like bulk import tables, or temporary tables - it depends.
在诸如批量导入表或临时表之类的东西上 - 这取决于。
To some people surprisingly, it appears that having a good clustered index actually can speed up operations like INSERT or UPDATE. See Kimberly Tripps excellent The Clustered Index Debate continues.... blog post in which she explains in great detail why this is the case.
令人惊讶的是,看起来拥有良好的聚簇索引实际上可以加速INSERT或UPDATE等操作。看到金伯利·特里普斯出色的聚集指数辩论继续....博客文章中,她详细解释了为什么会这样。
In this light: I don't see any valid reason not to have a good clustered index (narrow, stable, unique, ever-increasing = INT IDENTITY
as the most obvious choice) on any SQL Server table.
有鉴于此:我认为没有任何正当理由不在任何SQL Server表上拥有良好的聚簇索引(窄,稳定,唯一,不断增加= INT IDENTITY作为最明显的选择)。
To get some deep insights into how and why to choose clustering keys, read all of Kimberly Tripp's excellent blog posts on the topic:
要深入了解如何以及为何选择群集密钥,请阅读Kimberly Tripp关于该主题的所有优秀博文:
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustering-Key.aspx
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustering-Key.aspx
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustered-Index.aspx
http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustered-Index.aspx
Excellent stuff from the "Queen of Indexing" ! :-)
来自“索引女王”的优秀作品! :-)
#2
6
Clustered Tables vs Heap Tables
集群表与堆表
(Good article on subject at www.mssqltips.com)
(关于主题的好文章,请访问www.mssqltips.com)
HEAP Table (Without clustered index)
HEAP表(没有聚簇索引)
-
Data is not stored in any particular order
数据不以任何特定顺序存储
-
Specific data can not be retrieved quickly, unless there are also non-clustered indexes
除非还存在非聚集索引,否则无法快速检索特定数据
-
Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages
数据页未链接,因此顺序访问需要返回索引分配映射(IAM)页面
-
Since there is no clustered index, additional time is not needed to maintain the index
由于没有聚簇索引,因此不需要额外的时间来维护索引
-
Since there is no clustered index, there is not the need for additional space to store the clustered index tree
由于没有聚簇索引,因此不需要额外的空间来存储聚簇索引树
-
These tables have a index_id value of 0 in the sys.indexes catalog view
这些表在sys.indexes目录视图中的index_id值为0
Clustered Table
集群表
-
Data is stored in order based on the clustered index key
数据基于聚簇索引键按顺序存储
-
Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns
如果查询使用索引列,则可以基于聚簇索引键快速检索数据
-
Data pages are linked for faster sequential access Additional time is needed to maintain clustered index based on INSERTS, UPDATES and DELETES
链接数据页以加快顺序访问需要额外的时间来维护基于INSERTS,UPDATES和DELETES的聚簇索引
-
Additional space is needed to store clustered index tree These tables have a index_id value of 1 in the sys.indexes catalog view
存储聚簇索引树需要额外的空间这些表在sys.indexes目录视图中的index_id值为1
#3
1
Please read my answer under "No direct access to data row in clustered table - why?", first. Specifically item [2] Caveat.
请首先阅读我的回答“无法直接访问群集表中的数据行 - 为什么?”。具体项目[2]警告。
The people who created the "database" are cretins. They had:
创建“数据库”的人是cretins。他们有:
- a bunch of unnormalised spreadhseets, not normalised relational tables
- 一堆非规范化的spreadhseets,而不是规范化的关系表
- the PKs are all IDENTITY columns (the spreadsheets are linked to each other; they have to be navigated one-by-one-by-one); there is no relational access or relational power across the database
- PK都是IDENTITY列(电子表格彼此链接;它们必须逐个导航);数据库中没有关系访问或关系功能
- they had PRIMARY KEY, which produce UNIQUE CLUSTERED
- 他们有PRIMARY KEY,它产生了独特的聚集
- they found that that prevented concurrency
- 他们发现这阻止了并发性
- they removed the CI and made them all NCIs
- 他们删除了CI并使他们成为所有NCI
- they were too lazy to finish the reversal; to nominate an alternate (current NCI) to become the new CI, for each table
- 他们懒得完成逆转;为每个表提名一个备用(当前NCI)成为新的CI
- the IDENTITY column remains the Primary Key (it isn't really, but it is in this hamfisted implementation)
- IDENTITY列仍然是主键(它不是真的,但它在这个hamfisted实现中)
For such collections of spreadsheets masquerading as databases, it is becoming more and more common to avoid CIs altogether, and just have NCIs plus the Heap. Obviously they get none of the power or benefits of the CI, but hell, they get none of the power or benefit of Relational databases, so who cares that they get none of the power of CIs (which were designed for Relational databases, which theirs is not). The way they look at it, they have to "refactor" the darn thing every so often anyway, so why bother. Relational databases do not need "refactoring".
对于伪装成数据库的电子表格集合,完全避免CI会变得越来越普遍,并且只有NCI加上堆。显然他们没有得到CI的权力或好处,但是,他们没有得到关系数据库的权力或利益,所以谁关心他们没有得到CI的功能(这是为关系数据库设计的,他们的不是)。他们看待它的方式,他们不得不经常“重构”这个糟糕的事情,所以为什么要这么麻烦。关系数据库不需要“重构”。
If you need to discuss this response further, please post the CREATE TABLE/INDEX DDL; otherwise it is a time-wasting academic argument.
如果您需要进一步讨论此响应,请发布CREATE TABLE / INDEX DDL;否则这是浪费时间的学术争论。
#4
0
Here is another (have it already been provided in other answers?) possible reason (still to be understood):
这是另一个(已经在其他答案中提供过吗?)可能的原因(仍有待理解):
- SQL Server - Poor performance of PK delete
- SQL Server - PK删除性能不佳
I hope, I shall update later but for now it is rather the desire to link these topics
我希望,我稍后会更新,但现在更愿意将这些主题联系起来
Update:
What do I miss in understanding the clustered index?
更新:在理解聚簇索引时我会错过什么?
#5
0
With some b-tree servers/programming languages still used today, fixed or variable length flat ascii files are used for storing data. When a new data record/row is added to a file (table), the record is (1) appended to the end of the file (or replaces a deleted record) and (2) the indexes are balanced. When data is stored this way, you don't have to be concerned about system performance (as far as what the b-tree server is doing to return a pointer to the first data record). The response time is only effected by the # of nodes in your index files.
对于今天仍在使用的一些b树服务器/编程语言,固定或可变长度的平面ascii文件用于存储数据。将新数据记录/行添加到文件(表)时,记录将(1)附加到文件末尾(或替换已删除的记录)和(2)索引是平衡的。以这种方式存储数据时,您不必担心系统性能(就b-tree服务器执行的操作而言,返回指向第一个数据记录的指针)。响应时间仅受索引文件中的节点数的影响。
When you get into using SQL, you hopefully come to realize that system performance has to be considered whenever you write an SQL statement. Using an "ORDER BY" statement on a non-indexed column can bring a system to its knees. Using a clustered index might put an unnecessary load on the CPU. It's the 21st century and I wish we didn't have to think about system performance when programming in SQL, but we still do.
当您开始使用SQL时,您希望每当编写SQL语句时都必须考虑系统性能。在非索引列上使用“ORDER BY”语句可以使系统瘫痪。使用聚簇索引可能会给CPU带来不必要的负载。这是21世纪,我希望在SQL编程时我们不必考虑系统性能,但我们仍然这样做。
With some older programming languages, it was mandatory to use an index whenever sorted data is retrieved. I only wish this requirement was still in place today. I can only wonder how many companies have updated their slow computer systems due to a poorly written SQL statement on non-indexed data.
对于一些较旧的编程语言,每当检索到排序数据时都必须使用索引。我只希望今天这个要求仍然存在。我只能想知道有多少公司更新了他们的慢速计算机系统,因为在非索引数据上编写的SQL语句写得不好。
In my 25 years of programming, I've never needed my physical data stored in a particular order, so maybe that is why some programmers avoid using clustered indexes. It's hard to know what the tradeoff is (storage time, verses retrieval time) especially if the system you are designing might store millions of records someday.
在我25年的编程中,我从未需要以特定顺序存储我的物理数据,因此这可能是一些程序员避免使用聚簇索引的原因。很难知道权衡取舍(存储时间,检索时间),特别是如果您正在设计的系统有一天可能存储数百万条记录。