This question already has an answer here:
这个问题已经有了答案:
- What are the differences between a clustered and a non-clustered index? 10 answers
- 集群索引和非集群索引的区别是什么?10个答案
I need to add proper index
to my tables and need some help.
我需要在表中添加适当的索引,需要一些帮助。
I'm confused and need to clarify a few points:
我很困惑,需要澄清几点:
-
Should I use index for
non-int
columns? Why/why not我应该对非int列使用索引吗?为什么/为什么不
-
I've read a lot about
clustered
andnon-clustered
index yet I still can't decide when to use one over the other. A good example would help me and a lot of other developers.我已经读了很多关于集群和非聚集索引的内容,但是我仍然不能决定什么时候使用它们。一个很好的例子可以帮助我和其他许多开发人员。
I know that I shouldn't use indexes for columns or tables that are often updated. What else should I be careful about and how can I know that it is all good before going to test phase?
我知道我不应该为经常更新的列或表使用索引。我还需要注意什么?我如何在进入测试阶段之前知道一切都很好?
6 个解决方案
#1
79
You really need to keep two issues apart:
你真的需要把两个问题分开:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
1)主键是一个逻辑结构—候选键之一,它惟一且可靠地标识表中的每一行。这可以是任何东西,真的-一个INT, GUID,一个string -选择对你的场景最有意义的。
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
2)群集键(定义表上“群集索引”的列或列)——这是与物理存储相关的东西,在这里,一个小的、稳定的、不断增加的数据类型是您的最佳选择——INT或BIGINT作为默认选项。
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way!
默认情况下,SQL Server表上的主键也被用作集群键——但不需要这样做!
One rule of thumb I would apply is this: any "regular" table (one that you use to store data in, that is a lookup table etc.) should have a clustering key. There's really no point not to have a clustering key. Actually, contrary to common believe, having a clustering key actually speeds up all the common operations - even inserts and deletes (since the table organization is different and usually better than with a heap - a table without a clustering key).
我将使用的一条经验法则是:任何“常规”表(用于存储数据的表,即查找表等)都应该具有一个聚类键。没有集群密钥是没有意义的。实际上,与通常的想法相反,拥有集群键实际上会加速所有常见的操作——甚至是插入和删除(因为表组织不同,通常比使用堆更好——一个没有集群键的表)。
Kimberly Tripp, the Queen of Indexing has a great many excellent articles on the topic of why to have a clustering key, and what kind of columns to best use as your clustering key. Since you only get one per table, it's of utmost importance to pick the right clustering key - and not just any clustering key.
索引女王Kimberly Tripp有很多优秀的文章,主题是为什么要有一个集群密钥,以及最好用作集群密钥的列类型。因为每个表只有一个,所以选择正确的聚类键非常重要,而不仅仅是任何聚类键。
- GUIDs as PRIMARY KEY and/or clustered key
- gui作为主键和/或集群键
- The clustered index debate continues
- 集群索引的争论仍在继续
- Ever-increasing clustering key - the Clustered Index Debate..........again!
- 不断增加的聚类键——聚类索引辩论……再次!
- Disk space is cheap - that's not the point!
- 磁盘空间很便宜——这不是重点!
Marc
马克
#2
268
A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL server sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book.
集群索引改变了存储行的方式。当您在一个列(或多个列)上创建集群索引时,SQL server根据该列对表的行进行排序。它就像一本字典,书中所有的单词都是按字母顺序排列的。
A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference.
另一方面,非聚集索引不会改变表中存储行的方式。它在表中创建一个完全不同的对象,其中包含为索引而选择的列,以及指向包含数据的表行的指针。它就像一本书最后几页的索引,关键字被分类,并包含到这本书的材料的页码以供更快的参考。
#3
26
You should be using indexes to help SQL server performance. Usually that implies that columns that are used to find rows in a table are indexed.
您应该使用索引来帮助SQL服务器性能。通常这意味着用于查找表中的行的列将被索引。
Clustered indexes makes SQL server order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.
集群索引使SQL server根据索引顺序对磁盘上的行进行排序。这意味着,如果您按照聚集索引的顺序访问数据,那么数据将以正确的顺序显示在磁盘上。但是,如果具有聚集索引的列经常被更改,那么行将在磁盘上移动,导致开销——这通常不是一个好主意。
Having many indexes is not good either. They cost to maintain. So start out with the obvious ones, and then profile to see which ones you miss and would benefit from. You do not need them from start, they can be added later on.
拥有许多索引也不是好事。他们维护的成本。所以,从最明显的开始,然后剖析哪些是你遗漏的,哪些将从中受益。您从一开始就不需要它们,以后可以添加它们。
Most column datatypes can be used when indexing, but it is better to have small columns indexed than large. Also it is common to create indexes on groups of columns (e.g. country + city + street).
大多数列数据类型都可以在索引时使用,但最好将小列索引,而不是大列。对于列组(例如country + city + street)创建索引也很常见。
Also you will not notice performance issues until you have quite a bit of data in your tables. And another thing to think about is that SQL server needs statistics to do its query optimizations the right way, so make sure that you do generate that.
而且,在您的表中有相当多的数据之前,您不会注意到性能问题。另一件需要考虑的事情是,SQL server需要统计数据以正确的方式进行查询优化,所以一定要生成它。
#4
20
A comparison of a non-clustered index with a clustered index with an example
非聚集索引与聚集索引与示例的比较
As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the
作为非聚集索引的示例,假设在EmployeeID列上有一个非聚集索引。非聚集索引将存储该值。
EmployeeID
EmployeeID
AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like
以及指向Employee表中实际存储该值的行的指针。但另一方面,集群索引实际上会为一个特定的EmployeeID存储行数据——因此,如果您正在运行一个查询,以查找EmployeeID(15),表中其他列的数据就像这样
EmployeeName, EmployeeAddress, etc
EmployeeName,EmployeeAddress等等
. will all actually be stored in the leaf node of the clustered index itself.
。将所有内容实际存储在群集索引本身的叶节点中。
This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.
这意味着集群指数额外的工作需要遵循,指针表中的行检索任何其他所需的值,而不是一个聚集索引可以直接访问的行,因为它被存储在相同的顺序为聚集索引本身。因此,从聚集索引读取数据通常比从非聚集索引读取数据要快。
#5
4
In general, use an index on a column that's going to be used (a lot) to search the table, such as a primary key (which by default has a clustered index). For example, if you have the query (in pseudocode)
通常,在要搜索表的列上使用索引(很多),例如主键(默认情况下,主键有集群索引)。例如,如果您有查询(在伪代码中)
SELECT * FROM FOO WHERE FOO.BAR = 2
You might want to put an index on FOO.BAR. A clustered index should be used on a column that will be used for sorting. A clustered index is used to sort the rows on disk, so you can only have one per table. For example if you have the query
您可能想要在页脚。bar上放置一个索引。应该在用于排序的列上使用聚集索引。集群索引用于对磁盘上的行进行排序,因此每个表只能有一个。例如,如果您有查询。
SELECT * FROM FOO ORDER BY FOO.BAR ASCENDING
You might want to consider a clustered index on FOO.BAR.
您可能需要考虑在FOO.BAR上的聚集索引。
Probably the most important consideration is how much time your queries are taking. If a query doesn't take much time or isn't used very often, it may not be worth adding indexes. As always, profile first, then optimize. SQL Server Studio can give you suggestions on where to optimize, and MSDN has some information1 that you might find useful
可能最重要的考虑是您的查询占用了多少时间。如果一个查询不花太多时间或者不经常使用,它可能不值得添加索引。和往常一样,首先配置文件,然后进行优化。SQL Server Studio可以为您提供优化的建议,MSDN提供了一些您可能会发现有用的信息1
#6
2
faster to read than non cluster as data is physically storted in index order we can create only one per table.(cluster index)
读取速度比非群集快,因为数据按索引顺序物理存储,我们只能为每个表创建一个。(集群指数)
quicker for insert and update operation than a cluster index. we can create n number of non cluster index.
与集群索引相比,插入和更新操作更快。我们可以创建n个非群集索引。
#1
79
You really need to keep two issues apart:
你真的需要把两个问题分开:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
1)主键是一个逻辑结构—候选键之一,它惟一且可靠地标识表中的每一行。这可以是任何东西,真的-一个INT, GUID,一个string -选择对你的场景最有意义的。
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
2)群集键(定义表上“群集索引”的列或列)——这是与物理存储相关的东西,在这里,一个小的、稳定的、不断增加的数据类型是您的最佳选择——INT或BIGINT作为默认选项。
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way!
默认情况下,SQL Server表上的主键也被用作集群键——但不需要这样做!
One rule of thumb I would apply is this: any "regular" table (one that you use to store data in, that is a lookup table etc.) should have a clustering key. There's really no point not to have a clustering key. Actually, contrary to common believe, having a clustering key actually speeds up all the common operations - even inserts and deletes (since the table organization is different and usually better than with a heap - a table without a clustering key).
我将使用的一条经验法则是:任何“常规”表(用于存储数据的表,即查找表等)都应该具有一个聚类键。没有集群密钥是没有意义的。实际上,与通常的想法相反,拥有集群键实际上会加速所有常见的操作——甚至是插入和删除(因为表组织不同,通常比使用堆更好——一个没有集群键的表)。
Kimberly Tripp, the Queen of Indexing has a great many excellent articles on the topic of why to have a clustering key, and what kind of columns to best use as your clustering key. Since you only get one per table, it's of utmost importance to pick the right clustering key - and not just any clustering key.
索引女王Kimberly Tripp有很多优秀的文章,主题是为什么要有一个集群密钥,以及最好用作集群密钥的列类型。因为每个表只有一个,所以选择正确的聚类键非常重要,而不仅仅是任何聚类键。
- GUIDs as PRIMARY KEY and/or clustered key
- gui作为主键和/或集群键
- The clustered index debate continues
- 集群索引的争论仍在继续
- Ever-increasing clustering key - the Clustered Index Debate..........again!
- 不断增加的聚类键——聚类索引辩论……再次!
- Disk space is cheap - that's not the point!
- 磁盘空间很便宜——这不是重点!
Marc
马克
#2
268
A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL server sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book.
集群索引改变了存储行的方式。当您在一个列(或多个列)上创建集群索引时,SQL server根据该列对表的行进行排序。它就像一本字典,书中所有的单词都是按字母顺序排列的。
A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference.
另一方面,非聚集索引不会改变表中存储行的方式。它在表中创建一个完全不同的对象,其中包含为索引而选择的列,以及指向包含数据的表行的指针。它就像一本书最后几页的索引,关键字被分类,并包含到这本书的材料的页码以供更快的参考。
#3
26
You should be using indexes to help SQL server performance. Usually that implies that columns that are used to find rows in a table are indexed.
您应该使用索引来帮助SQL服务器性能。通常这意味着用于查找表中的行的列将被索引。
Clustered indexes makes SQL server order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.
集群索引使SQL server根据索引顺序对磁盘上的行进行排序。这意味着,如果您按照聚集索引的顺序访问数据,那么数据将以正确的顺序显示在磁盘上。但是,如果具有聚集索引的列经常被更改,那么行将在磁盘上移动,导致开销——这通常不是一个好主意。
Having many indexes is not good either. They cost to maintain. So start out with the obvious ones, and then profile to see which ones you miss and would benefit from. You do not need them from start, they can be added later on.
拥有许多索引也不是好事。他们维护的成本。所以,从最明显的开始,然后剖析哪些是你遗漏的,哪些将从中受益。您从一开始就不需要它们,以后可以添加它们。
Most column datatypes can be used when indexing, but it is better to have small columns indexed than large. Also it is common to create indexes on groups of columns (e.g. country + city + street).
大多数列数据类型都可以在索引时使用,但最好将小列索引,而不是大列。对于列组(例如country + city + street)创建索引也很常见。
Also you will not notice performance issues until you have quite a bit of data in your tables. And another thing to think about is that SQL server needs statistics to do its query optimizations the right way, so make sure that you do generate that.
而且,在您的表中有相当多的数据之前,您不会注意到性能问题。另一件需要考虑的事情是,SQL server需要统计数据以正确的方式进行查询优化,所以一定要生成它。
#4
20
A comparison of a non-clustered index with a clustered index with an example
非聚集索引与聚集索引与示例的比较
As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the
作为非聚集索引的示例,假设在EmployeeID列上有一个非聚集索引。非聚集索引将存储该值。
EmployeeID
EmployeeID
AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like
以及指向Employee表中实际存储该值的行的指针。但另一方面,集群索引实际上会为一个特定的EmployeeID存储行数据——因此,如果您正在运行一个查询,以查找EmployeeID(15),表中其他列的数据就像这样
EmployeeName, EmployeeAddress, etc
EmployeeName,EmployeeAddress等等
. will all actually be stored in the leaf node of the clustered index itself.
。将所有内容实际存储在群集索引本身的叶节点中。
This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.
这意味着集群指数额外的工作需要遵循,指针表中的行检索任何其他所需的值,而不是一个聚集索引可以直接访问的行,因为它被存储在相同的顺序为聚集索引本身。因此,从聚集索引读取数据通常比从非聚集索引读取数据要快。
#5
4
In general, use an index on a column that's going to be used (a lot) to search the table, such as a primary key (which by default has a clustered index). For example, if you have the query (in pseudocode)
通常,在要搜索表的列上使用索引(很多),例如主键(默认情况下,主键有集群索引)。例如,如果您有查询(在伪代码中)
SELECT * FROM FOO WHERE FOO.BAR = 2
You might want to put an index on FOO.BAR. A clustered index should be used on a column that will be used for sorting. A clustered index is used to sort the rows on disk, so you can only have one per table. For example if you have the query
您可能想要在页脚。bar上放置一个索引。应该在用于排序的列上使用聚集索引。集群索引用于对磁盘上的行进行排序,因此每个表只能有一个。例如,如果您有查询。
SELECT * FROM FOO ORDER BY FOO.BAR ASCENDING
You might want to consider a clustered index on FOO.BAR.
您可能需要考虑在FOO.BAR上的聚集索引。
Probably the most important consideration is how much time your queries are taking. If a query doesn't take much time or isn't used very often, it may not be worth adding indexes. As always, profile first, then optimize. SQL Server Studio can give you suggestions on where to optimize, and MSDN has some information1 that you might find useful
可能最重要的考虑是您的查询占用了多少时间。如果一个查询不花太多时间或者不经常使用,它可能不值得添加索引。和往常一样,首先配置文件,然后进行优化。SQL Server Studio可以为您提供优化的建议,MSDN提供了一些您可能会发现有用的信息1
#6
2
faster to read than non cluster as data is physically storted in index order we can create only one per table.(cluster index)
读取速度比非群集快,因为数据按索引顺序物理存储,我们只能为每个表创建一个。(集群指数)
quicker for insert and update operation than a cluster index. we can create n number of non cluster index.
与集群索引相比,插入和更新操作更快。我们可以创建n个非群集索引。