I've been studying indexes and trying to understand how they work and how I can use them to boost performance, but I'm missing something.
我一直在研究索引并试图了解它们的工作方式以及如何使用它们来提高性能,但我遗漏了一些东西。
I have the following table:
我有下表:
Person:
| Id | Name | Email | Phone |
| 1 | John | E1 | P1 |
| 2 | Max | E2 | P2 |
I'm trying to find the best way to index the columns Email
and Phone
considering that the queries will (most of the time) be of the form
我正在尝试找到索引电子邮件和电话列的最佳方法,因为查询将(大部分时间)属于表单形式
[1] SELECT * FROM Person WHERE Email = '...' OR Phone = '...'
[2] SELECT * FROM Person WHERE Email = ...
[3] SELECT * FROM Person WHERE Phone = ...
I thought the best approach would be to create a single index using both columns:
我认为最好的方法是使用两列创建单个索引:
CREATE NONCLUSTERED INDEX [IX_EmailPhone]
ON [dbo].[Person]([Email], [PhoneNumber]);
However, with the index above, only the query [2] benefits from an index seek, the others use index scan.
但是,使用上面的索引,只有查询[2]受益于索引查找,其他查询[2]使用索引扫描。
I also tried to create multiple index: one with both columns, one for email, and one for email. In this case, [2] and [3] use seek, but [1] continues to use scan.
我还尝试创建多个索引:一个包含两列,一个用于电子邮件,另一个用于电子邮件。在这种情况下,[2]和[3]使用seek,但[1]继续使用scan。
Why can't the database use index with an or? What would be the best indexing approach for this table considering the queries?
为什么数据库不能使用带有或的索引?考虑到查询,该表的最佳索引方法是什么?
2 个解决方案
#1
1
Create a separate index for each column.
By using hints we can force the optimizer to use/not use the indexes, so you can check the execution plan, get a feeling of the performance involved and understand the meaning of each path.
为每列创建单独的索引。通过使用提示,我们可以强制优化器使用/不使用索引,因此您可以检查执行计划,了解所涉及的性能并了解每个路径的含义。
Go through my demo and consider the work involved in each path for the following scenarios -
浏览我的演示并考虑以下场景中每条路径所涉及的工作 -
-
Only few rows satisfy the condition j=123.
Only few rows satisfy the condition k=456.只有少数行满足条件j = 123。只有少数行满足条件k = 456。
-
Most of the rows satisfy the condition j=123.
Most of the rows satisfy the condition k=456.大多数行满足条件j = 123。大多数行满足条件k = 456。
-
Only few rows satisfy the condition j=123.
Most of the rows satisfy the condition k=456.只有少数行满足条件j = 123。大多数行满足条件k = 456。
Try to think what path you would have chosen for each scenario.
Please feel free to ask questions.
试着想一下你为每个场景选择的路径。请随时提问。
Demo
;with t(n) as (select 0 union all select n+1 from t where n < 999)
select 1+t0.n+1000*t1.n as i
,floor(rand(cast (newid() as varbinary))*1000) as j
,floor(rand(cast (newid() as varbinary))*1000) as k
into t
from t t0,t t1
option (maxrecursion 0)
;
create index t_j on t (j);
create index t_k on t (k);
update statistics t (t_j)
update statistics t (t_k)
Scan
select *
from t (forcescan)
where j = 123
or k = 456
- This is straightforward.
这很简单。
Seek
select *
from t (forceseek)
where j = 123
or k = 456
- "Index Seek": Each index is being seeked for the relevant values (123 and 456)
- "Merge Join": The results (row IDs) are being concatenated (as in UNION ALL)
- "Stream Aggregate": Duplicate row IDs are being eliminated
- "Rid Lookup" + "Nested Loops": The row IDs are being used to retrieve the rows from the table (t)
“索引寻求”:正在寻找每个指数的相关值(123和456)
“合并连接”:结果(行ID)正在连接(如在UNION ALL中)
“Stream Aggregate”:正在删除重复的行ID
“Rid Lookup”+“嵌套循环”:行ID用于从表中检索行(t)
#2
0
Use two separate indexes, one on (email)
and one on (phone, email)
.
使用两个单独的索引,一个在(电子邮件),一个在(电话,电子邮件)。
The OR
is rather difficult. If your conditions were connected by AND
rather than OR
, then your index would be used for the first query (but not the third, because phone
is not the first key in the index).
OR非常困难。如果您的条件通过AND而不是OR连接,那么您的索引将用于第一个查询(但不是第三个查询,因为电话不是索引中的第一个键)。
You can write the query as:
您可以将查询编写为:
SELECT *
FROM Person
WHERE Email = '...'
UNION ALL
SELECT *
FROM Person
WHERE Email <> '...' AND Phone = '...';
SQL Server should use the appropriate index for each subquery.
SQL Server应为每个子查询使用适当的索引。
#1
1
Create a separate index for each column.
By using hints we can force the optimizer to use/not use the indexes, so you can check the execution plan, get a feeling of the performance involved and understand the meaning of each path.
为每列创建单独的索引。通过使用提示,我们可以强制优化器使用/不使用索引,因此您可以检查执行计划,了解所涉及的性能并了解每个路径的含义。
Go through my demo and consider the work involved in each path for the following scenarios -
浏览我的演示并考虑以下场景中每条路径所涉及的工作 -
-
Only few rows satisfy the condition j=123.
Only few rows satisfy the condition k=456.只有少数行满足条件j = 123。只有少数行满足条件k = 456。
-
Most of the rows satisfy the condition j=123.
Most of the rows satisfy the condition k=456.大多数行满足条件j = 123。大多数行满足条件k = 456。
-
Only few rows satisfy the condition j=123.
Most of the rows satisfy the condition k=456.只有少数行满足条件j = 123。大多数行满足条件k = 456。
Try to think what path you would have chosen for each scenario.
Please feel free to ask questions.
试着想一下你为每个场景选择的路径。请随时提问。
Demo
;with t(n) as (select 0 union all select n+1 from t where n < 999)
select 1+t0.n+1000*t1.n as i
,floor(rand(cast (newid() as varbinary))*1000) as j
,floor(rand(cast (newid() as varbinary))*1000) as k
into t
from t t0,t t1
option (maxrecursion 0)
;
create index t_j on t (j);
create index t_k on t (k);
update statistics t (t_j)
update statistics t (t_k)
Scan
select *
from t (forcescan)
where j = 123
or k = 456
- This is straightforward.
这很简单。
Seek
select *
from t (forceseek)
where j = 123
or k = 456
- "Index Seek": Each index is being seeked for the relevant values (123 and 456)
- "Merge Join": The results (row IDs) are being concatenated (as in UNION ALL)
- "Stream Aggregate": Duplicate row IDs are being eliminated
- "Rid Lookup" + "Nested Loops": The row IDs are being used to retrieve the rows from the table (t)
“索引寻求”:正在寻找每个指数的相关值(123和456)
“合并连接”:结果(行ID)正在连接(如在UNION ALL中)
“Stream Aggregate”:正在删除重复的行ID
“Rid Lookup”+“嵌套循环”:行ID用于从表中检索行(t)
#2
0
Use two separate indexes, one on (email)
and one on (phone, email)
.
使用两个单独的索引,一个在(电子邮件),一个在(电话,电子邮件)。
The OR
is rather difficult. If your conditions were connected by AND
rather than OR
, then your index would be used for the first query (but not the third, because phone
is not the first key in the index).
OR非常困难。如果您的条件通过AND而不是OR连接,那么您的索引将用于第一个查询(但不是第三个查询,因为电话不是索引中的第一个键)。
You can write the query as:
您可以将查询编写为:
SELECT *
FROM Person
WHERE Email = '...'
UNION ALL
SELECT *
FROM Person
WHERE Email <> '...' AND Phone = '...';
SQL Server should use the appropriate index for each subquery.
SQL Server应为每个子查询使用适当的索引。