In my application I have two queries which will be quite frequently used. The Where clauses of these queries are the following:
在我的应用程序中,我有两个查询,它们将被频繁地使用。这些查询的Where子句如下:
WHERE FieldA = @P1 AND (FieldB = @P2 OR FieldC = @P2)
and
和
WHERE FieldA = @P1 AND FieldB = @P2
P1
and P2
are parameters entered in the UI or coming from external datasources.
P1和P2是输入到UI或来自外部数据源的参数。
- FieldA is an
int
and highly non-unique, means: only two, three, four different values in a table with say 20000 rows - FieldA是一个高度非惟一的int类型,意味着:表中只有2、3、4个不同的值,比如有20,000行
- FieldB is a
varchar(20)
and is "almost" unique, there will be only very few rows where FieldB might have the same value - FieldB是varchar(20),并且“几乎”惟一,只有很少的行可以使FieldB具有相同的值
- FieldC is a
varchar(15)
and also highly distinct, but not as much as FieldB - FieldC是一个varchar(15),并且高度不同,但不像FieldB那么多。
- FieldA and FieldB together are unique (but do not form my primary key, which is a simple auto-incrementing identity column with a clustered index)
- FieldA和FieldB一起是惟一的(但是不要形成我的主键,它是一个带有集群索引的简单自动递增的标识列)
I'm wondering now what's the best way to define an index to speed up specifically these two queries. Shall I define one index with...
我现在想知道,什么是定义索引的最好方法,以加快这两个查询的速度。我用…来定义一个索引好吗?
FieldB (or better FieldC here?)
FieldC (or better FieldB here?)
FieldA
... or better two indices:
…或者更好的两个指标:
FieldB
FieldA
and
和
FieldC
FieldA
Or are there even other and better options? What's the best way and why?
或者还有其他更好的选择吗?最好的方法是什么?为什么?
Thank you for suggestions in advance!
提前感谢您的建议!
Edit:
编辑:
Just as an info to other readers: Here was another answer which has been deleted now. Actually the answer seemed very useful to me. The recommendation was to create two indices (according to my second option above) and to reformulate the first query by using a UNION
of two select statements (one with WHERE FieldA = @P1 AND FieldB = @P2
and one with WHERE FieldA = @P1 AND FieldC = @P2
) instead of OR
to benefit from both indices (which wouldn't be the case with the OR-operator).
就像给其他读者的信息一样:这是另一个已经被删除的答案。实际上,这个答案对我来说似乎很有用。建议是创建两个指标(根据我的第二个选择上图),将第一个查询通过使用两个select语句的联盟(一个基坑= @P1 FieldB = @P2和一个与基坑= @P1 FieldC = @P2)代替或受益于这两个指数(不会的情况或算子)。
Edit2:
Edit2:
The statement that with OR the indexes are not used and that a UNION is preferable seems to be wrong - at least according to my own tests (see my own answer below).
使用或索引不使用且联合更可取的说法似乎是错误的——至少根据我自己的测试(请参阅下面我自己的答案)。
2 个解决方案
#1
3
Extending Remus' (edit: now deleted) answer...
扩展Remus'(编辑:现在删除)答案…
- if @p2 is varchar(15) then you can't compare against FieldB reliably, it's varchar(20)
- 如果@p2是varchar(15),那么您就无法可靠地比较FieldB,它是varchar(20)
- if @p2 is varchar(20) then FieldC will be converted to varchar(20) and not use an index (or at best scan it)
- 如果@p2是varchar(20),那么FieldC将被转换为varchar(20),并且不会使用索引(或者最好是扫描它)
- if @p1 only has 2, 3, 4 values then why not tinyint and reduce table/index size?
- 如果@p1只有2、3、4个值,那么为什么不tinyint和reduce表/索引大小呢?
I wouldn't bother with indexes until you resolve this datatype precedence issue: this is on top of the OR clause issue.
在您解决这个数据类型优先问题之前,我不会对索引感到麻烦:这是OR子句问题之上的问题。
Finally, a column is unique or non-unique: there is no in between. Statistics help here with selectivity, but it's irrelevant.
最后,列是唯一的或非唯一的:中间不存在。统计数据有助于选择性,但它是无关的。
I would reverse the indexes from Remus' answer to be FieldB, FieldA
(and unique) and FieldC, FieldA
because of FieldA's selectivity
我将会从雷姆斯的回答中逆转为FieldB, FieldA(和唯一的)和FieldA,因为FieldA的选择性。
Edit, after comments: you can't compare the use of @p2 against the use of constant strings.
编辑,注释:您不能比较使用@p2与使用常量字符串。
#2
0
I add my own answer after some tests with a larger database (in SQL Server 2008):
在使用大型数据库(SQL Server 2008)进行了一些测试之后,我添加了自己的答案:
First, I have decided for the second option, means, I have created two indexes:
首先,我决定了第二种选择,意味着,我创建了两个索引:
CREATE UNIQUE NONCLUSTERED INDEX [IX_B] ON [dbo].[MyTable]
(
[FieldB] ASC,
[FieldA] ASC
)
CREATE NONCLUSTERED INDEX [IX_C] ON [dbo].[MyTable]
(
[FieldC] ASC,
[FieldA] ASC
)
I've tested then two queries:
我测试了两个查询:
declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';
select * from MyTable
where FieldA=@p1 and (FieldB=@p2 or FieldC=@p2);
Performing this query I get the following query plan (ID
is the primary key of the table, PK_MyTable
the clustered index on the primary key):
执行此查询,我得到以下查询计划(ID是表的主键,PK_MyTable是主键上的聚集索引):
|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
|--Stream Aggregate(GROUP BY:([MyDb].[dbo].[MyTable].[ID]) DEFINE:([MyDb].[dbo].[MyTable].[FieldA]=ANY([MyDb].[dbo].[MyTable].[FieldA])))
| |--Merge Join(Concatenation)
| |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
| |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
So it seems both indexes are used ("Index Seek").
所以这两个索引似乎都被使用了(“索引查找”)。
Elapsed time for the query: 00:00:00.2220127
查询时间:00:00:00:00 .22 . 20127
The second query I tested was using a JOIN to avoid the OR operator (see the "Edit" in my question):
我测试的第二个查询是使用连接来避免OR操作符(请参阅我的问题中的“编辑”):
declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';
select * from MyTable where FieldA=@p1 and FieldB=@p2
union
select * from MyTable where FieldA=@p1 and FieldC=@p2;
This query has the following query plan:
该查询具有以下查询计划:
|--Merge Join(Union)
|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
| |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
| |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
|--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
Again both indexes are used ("Index seek").
再次使用这两个索引(“索引查找”)。
Elapsed time for the query: 00:00:00.3710212
查询时间:00:00:00:00:00:01 .3710212
Note: For both queries it doesn't matter which length I declare @p2 with: Using varchar(8) or varchar(20) or varchar(30) gives the same results and query plans.
注意:对于这两个查询,我使用varchar(8)或varchar(20)或varchar(30)声明@p2的长度无关紧要:使用varchar(20)或varchar(30)会给出相同的结果和查询计划。
Following these results I will stay with using the OR operator instead of the UNION, since both queries use the indexes but the first one is faster.
在这些结果之后,我将使用OR操作符而不是UNION,因为两个查询都使用索引,但是第一个查询更快。
#1
3
Extending Remus' (edit: now deleted) answer...
扩展Remus'(编辑:现在删除)答案…
- if @p2 is varchar(15) then you can't compare against FieldB reliably, it's varchar(20)
- 如果@p2是varchar(15),那么您就无法可靠地比较FieldB,它是varchar(20)
- if @p2 is varchar(20) then FieldC will be converted to varchar(20) and not use an index (or at best scan it)
- 如果@p2是varchar(20),那么FieldC将被转换为varchar(20),并且不会使用索引(或者最好是扫描它)
- if @p1 only has 2, 3, 4 values then why not tinyint and reduce table/index size?
- 如果@p1只有2、3、4个值,那么为什么不tinyint和reduce表/索引大小呢?
I wouldn't bother with indexes until you resolve this datatype precedence issue: this is on top of the OR clause issue.
在您解决这个数据类型优先问题之前,我不会对索引感到麻烦:这是OR子句问题之上的问题。
Finally, a column is unique or non-unique: there is no in between. Statistics help here with selectivity, but it's irrelevant.
最后,列是唯一的或非唯一的:中间不存在。统计数据有助于选择性,但它是无关的。
I would reverse the indexes from Remus' answer to be FieldB, FieldA
(and unique) and FieldC, FieldA
because of FieldA's selectivity
我将会从雷姆斯的回答中逆转为FieldB, FieldA(和唯一的)和FieldA,因为FieldA的选择性。
Edit, after comments: you can't compare the use of @p2 against the use of constant strings.
编辑,注释:您不能比较使用@p2与使用常量字符串。
#2
0
I add my own answer after some tests with a larger database (in SQL Server 2008):
在使用大型数据库(SQL Server 2008)进行了一些测试之后,我添加了自己的答案:
First, I have decided for the second option, means, I have created two indexes:
首先,我决定了第二种选择,意味着,我创建了两个索引:
CREATE UNIQUE NONCLUSTERED INDEX [IX_B] ON [dbo].[MyTable]
(
[FieldB] ASC,
[FieldA] ASC
)
CREATE NONCLUSTERED INDEX [IX_C] ON [dbo].[MyTable]
(
[FieldC] ASC,
[FieldA] ASC
)
I've tested then two queries:
我测试了两个查询:
declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';
select * from MyTable
where FieldA=@p1 and (FieldB=@p2 or FieldC=@p2);
Performing this query I get the following query plan (ID
is the primary key of the table, PK_MyTable
the clustered index on the primary key):
执行此查询,我得到以下查询计划(ID是表的主键,PK_MyTable是主键上的聚集索引):
|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
|--Stream Aggregate(GROUP BY:([MyDb].[dbo].[MyTable].[ID]) DEFINE:([MyDb].[dbo].[MyTable].[FieldA]=ANY([MyDb].[dbo].[MyTable].[FieldA])))
| |--Merge Join(Concatenation)
| |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
| |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
So it seems both indexes are used ("Index Seek").
所以这两个索引似乎都被使用了(“索引查找”)。
Elapsed time for the query: 00:00:00.2220127
查询时间:00:00:00:00 .22 . 20127
The second query I tested was using a JOIN to avoid the OR operator (see the "Edit" in my question):
我测试的第二个查询是使用连接来避免OR操作符(请参阅我的问题中的“编辑”):
declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';
select * from MyTable where FieldA=@p1 and FieldB=@p2
union
select * from MyTable where FieldA=@p1 and FieldC=@p2;
This query has the following query plan:
该查询具有以下查询计划:
|--Merge Join(Union)
|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
| |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
| |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
|--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
Again both indexes are used ("Index seek").
再次使用这两个索引(“索引查找”)。
Elapsed time for the query: 00:00:00.3710212
查询时间:00:00:00:00:00:01 .3710212
Note: For both queries it doesn't matter which length I declare @p2 with: Using varchar(8) or varchar(20) or varchar(30) gives the same results and query plans.
注意:对于这两个查询,我使用varchar(8)或varchar(20)或varchar(30)声明@p2的长度无关紧要:使用varchar(20)或varchar(30)会给出相同的结果和查询计划。
Following these results I will stay with using the OR operator instead of the UNION, since both queries use the indexes but the first one is faster.
在这些结果之后,我将使用OR操作符而不是UNION,因为两个查询都使用索引,但是第一个查询更快。