I’ve just started looking into optimizing my queries through indexes because SQL data is growing large and fast. I looked at how the optimizer is processing my query through the Execution plan in SSMS and noticed that a Sort operator is being used. I’ve heard that a Sort operator indicates a bad design in the query since the sort can be made prematurely through an index. So here is an example table and data similar to what I’m doing:
我刚刚开始研究如何通过索引优化查询,因为SQL数据正在快速增长。我查看了优化器如何通过ssm中的执行计划来处理我的查询,并注意到正在使用Sort操作符。我听说排序操作符在查询中显示了糟糕的设计,因为排序可以通过索引提前进行。这是一个例子表格和数据与我所做的相似:
IF OBJECT_ID('dbo.Store') IS NOT NULL DROP TABLE dbo.[Store]
GO
CREATE TABLE dbo.[Store]
(
[StoreId] int NOT NULL IDENTITY (1, 1),
[ParentStoreId] int NULL,
[Type] int NULL,
[Phone] char(10) NULL,
PRIMARY KEY ([StoreId])
)
INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 0, '2223334444')
INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 0, '3334445555')
INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 1, '0001112222')
INSERT INTO dbo.[Store] ([ParentStoreId], [Type], [Phone]) VALUES (10, 1, '1112223333')
GO
Here is an example query:
下面是一个查询示例:
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND ([Type] = 0 OR [Type] = 1)
ORDER BY [Phone]
I create a non clustered index to help speed up the query:
我创建了一个非聚集索引来帮助加快查询:
CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Type], [Phone])
To build the IX_Store index, I start with the simple predicates
为了构建IX_Store索引,我从简单的谓词开始。
[ParentStoreId] = 10
AND ([Type] = 0 OR [Type] = 1)
Then I add the [Phone]
column for the ORDER BY and to cover the SELECT output
然后,我添加[电话]列的顺序,并覆盖选择的输出。
So even when the index is built, the optimizer still uses the Sort operator (and not the index sort) because [Phone]
is sorted AFTER [ParentStoreId]
AND [Type]
. If I remove the [Type]
column from the index and run the query:
因此,即使建立了索引,优化器仍然使用排序操作符(而不是索引排序),因为[Phone]是在[ParentStoreId]和[Type]之后排序的。如果我从索引中删除[Type]列并运行查询:
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
--AND ([Type] = 0 OR [Type] = 1)
ORDER BY [Phone]
Then of course the Sort operator is not used by the optimizer because [Phone]
is sorted by [ParentStoreId]
.
当然,优化器不会使用排序操作符,因为[Phone]是按[ParentStoreId]排序的。
So the question is how can I create an index that will cover the query (including the [Type]
predicate) and not have the optimizer use a Sort?
所以问题是,如何创建一个索引来覆盖查询(包括[Type]谓词),而不让优化器使用排序?
EDIT:
编辑:
The table I'm working with has more than 20 million rows
我正在处理的表有超过2000万行
1 个解决方案
#1
12
First, you should verify that the sort is actually a performance bottleneck. The duration of the sort will depend on the number of elements to be sorted, and the number of stores for a particular parent store is likely to be small. (That is assuming the sort operator is applied after applying the where clause).
首先,您应该验证排序实际上是一个性能瓶颈。排序的持续时间将取决于要排序的元素的数量,并且特定父存储的存储数量可能很小。(即假定在应用where子句之后应用排序运算符)。
I’ve heard that a Sort operator indicates a bad design in the query since the sort can be made prematurely through an index
我听说排序操作符在查询中显示了糟糕的设计,因为排序可以通过索引提前进行
That's an over-generalization. Often, a sort-operator can trivially be moved into the index, and, if only the first couple rows of the result set are fetched, can substantially reduce query cost, because the database no longer has to fetch all matching rows (and sort them all) to find the first ones, but can read the records in result set order, and stop once enough records are found.
这是一个通病。经常sort-operator非常可以进入指数,如果只有前两行结果集的获取,可以大幅度减少查询成本,因为数据库不再有获取所有匹配的行(排序)找到第一批,但可以读取结果集的记录,并停止一旦发现足够的记录。
In your case, you seem to be fetching the entire result set, so sorting that is unlikely to make things much worse (unless the result set is huge). Also, in your case it might not be trivial to build a useful sorted index, because the where clause contains an or.
在您的例子中,您似乎正在获取整个结果集,因此排序不太可能使事情变得更糟(除非结果集很大)。此外,在您的示例中,构建一个有用的排序索引可能并不简单,因为where子句包含一个or。
Now, if you still want to get rid of that sort-operator, you can try:
现在,如果你还想摆脱那个排序操作符,你可以试试:
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] in (0, 1)
ORDER BY [Phone]
Alternatively, you can try the following index:
您也可以尝试以下索引:
CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Phone], [Type])
to try getting the query optimizer to do an index range scan on ParentStoreId
only, then scan all matching rows in the index, outputting them if Type
matches. However, this is likely to cause more disk I/O, and hence slow your query down rather than speed it up.
要尝试让查询优化器只在ParentStoreId上执行索引范围扫描,然后扫描索引中的所有匹配行,如果类型匹配,则输出它们。但是,这可能会导致更多的磁盘I/O,从而降低查询速度,而不是加快查询速度。
Edit: As a last resort, you could use
编辑:作为最后的手段,你可以使用。
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] = 0
ORDER BY [Phone]
UNION ALL
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] = 1
ORDER BY [Phone]
with
与
CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Type], [Phone])
and sort the two lists on the application server, where you can merge (as in merge sort) the presorted lists, thereby avoiding a complete sort. But that's really a micro-optimization that, while speeding up the sort itself by an order of magnitude, is unlikely to affect the total execution time of the query much, as I'd expect the bottleneck to be network and disk I/O, especially in light of the fact that the disk will do a lot of random access as the index is not clustered.
并对应用服务器上的两个列表进行排序,您可以在其中合并(如合并排序)预先列出的列表,从而避免完整的排序。但这真的是一个优化,加速通过一个数量级,不太可能影响查询的总执行时间太多,我希望网络和磁盘I / O瓶颈,尤其是考虑到这一事实的磁盘会做大量的随机访问索引不是集群。
#1
12
First, you should verify that the sort is actually a performance bottleneck. The duration of the sort will depend on the number of elements to be sorted, and the number of stores for a particular parent store is likely to be small. (That is assuming the sort operator is applied after applying the where clause).
首先,您应该验证排序实际上是一个性能瓶颈。排序的持续时间将取决于要排序的元素的数量,并且特定父存储的存储数量可能很小。(即假定在应用where子句之后应用排序运算符)。
I’ve heard that a Sort operator indicates a bad design in the query since the sort can be made prematurely through an index
我听说排序操作符在查询中显示了糟糕的设计,因为排序可以通过索引提前进行
That's an over-generalization. Often, a sort-operator can trivially be moved into the index, and, if only the first couple rows of the result set are fetched, can substantially reduce query cost, because the database no longer has to fetch all matching rows (and sort them all) to find the first ones, but can read the records in result set order, and stop once enough records are found.
这是一个通病。经常sort-operator非常可以进入指数,如果只有前两行结果集的获取,可以大幅度减少查询成本,因为数据库不再有获取所有匹配的行(排序)找到第一批,但可以读取结果集的记录,并停止一旦发现足够的记录。
In your case, you seem to be fetching the entire result set, so sorting that is unlikely to make things much worse (unless the result set is huge). Also, in your case it might not be trivial to build a useful sorted index, because the where clause contains an or.
在您的例子中,您似乎正在获取整个结果集,因此排序不太可能使事情变得更糟(除非结果集很大)。此外,在您的示例中,构建一个有用的排序索引可能并不简单,因为where子句包含一个or。
Now, if you still want to get rid of that sort-operator, you can try:
现在,如果你还想摆脱那个排序操作符,你可以试试:
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] in (0, 1)
ORDER BY [Phone]
Alternatively, you can try the following index:
您也可以尝试以下索引:
CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Phone], [Type])
to try getting the query optimizer to do an index range scan on ParentStoreId
only, then scan all matching rows in the index, outputting them if Type
matches. However, this is likely to cause more disk I/O, and hence slow your query down rather than speed it up.
要尝试让查询优化器只在ParentStoreId上执行索引范围扫描,然后扫描索引中的所有匹配行,如果类型匹配,则输出它们。但是,这可能会导致更多的磁盘I/O,从而降低查询速度,而不是加快查询速度。
Edit: As a last resort, you could use
编辑:作为最后的手段,你可以使用。
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] = 0
ORDER BY [Phone]
UNION ALL
SELECT [Phone]
FROM [dbo].[Store]
WHERE [ParentStoreId] = 10
AND [Type] = 1
ORDER BY [Phone]
with
与
CREATE NONCLUSTERED INDEX IX_Store ON dbo.[Store]([ParentStoreId], [Type], [Phone])
and sort the two lists on the application server, where you can merge (as in merge sort) the presorted lists, thereby avoiding a complete sort. But that's really a micro-optimization that, while speeding up the sort itself by an order of magnitude, is unlikely to affect the total execution time of the query much, as I'd expect the bottleneck to be network and disk I/O, especially in light of the fact that the disk will do a lot of random access as the index is not clustered.
并对应用服务器上的两个列表进行排序,您可以在其中合并(如合并排序)预先列出的列表,从而避免完整的排序。但这真的是一个优化,加速通过一个数量级,不太可能影响查询的总执行时间太多,我希望网络和磁盘I / O瓶颈,尤其是考虑到这一事实的磁盘会做大量的随机访问索引不是集群。