I'm in the process of indexing the content of a CMS with Lucene, so I have extended the SQL Server database schema to add an "IsIndexed" bit column, so the Lucene indexer can find a piece of content that hasn't already been indexed.
我正在使用Lucene索引CMS的内容,所以我扩展了SQL Server数据库模式以添加“IsIndexed”位列,因此Lucene索引器可以找到一段尚未存在的内容索引。
I added an index to the Content
table so lookups for the IsIndexed
column should go faster. This is what the database looks like:
我在Content表中添加了一个索引,因此对IsIndexed列的查找应该更快。这就是数据库的样子:
CREATE TABLE Content (
DocumentId bigint,
CategoryId bigint,
Title nvarchar(255),
AuthorUserId bigint,
Body nvarchar(MAX),
IsIndexed bit
)
CREATE TABLE Users (
UserId bigint,
UserName nvarchar(20)
)
The following indexes exist:
存在以下索引:
Content (
PK_Content (Clustered) : DocumentId ASC
IX_CategoryId (Non-Unique, Non-Clustered) : CategoryId ASC
IX_AuthorUserId (Non-Unique, Non-Clustered) : AuthorUserId ASC
IX_Indexed_ASC (Non-Unique, Non-Clustered) : IsIndexed ASC, DocumentId ASC
IX_Indexed_DESC (Non-Unique, Non-Clustered) : IsIndexed DESC, DocumentId ASC
)
Users (
PK_Users (Clustered) : UserId
)
This is the query used to find nonindexed content:
这是用于查找非索引内容的查询:
SELECT
TOP 1
Content.DocumentId,
Content.CategoryId,
Content.Title,
Content.AuthorUserId,
Content.Body
Users.UserName
FROM
Content
INNER JOIN Users ON Content.AuthorUserId = Users.UserId
WHERE
IsIndexed = 0
However when I run it the Actual Execution Plan reports a Clustered Index Scan for PK_Content combined with a Clustered Index Seek for PK_Users. The query takes about 300ms to execute.
但是,当我运行它时,实际执行计划会报告PK_Content的聚集索引扫描以及PK_Users的聚簇索引搜索。查询大约需要300毫秒才能执行。
When I modify the query to remove the Users.UserName field and the Users inner-join, then the query takes about 60ms to run and there is no Clustered Index Scan for PK_Content, only a Clustered Index seek for PK_Content.
当我修改查询以删除Users.UserName字段和Users内连接时,查询大约需要60ms才能运行,并且没有针对PK_Content的Clustered Index Scan,只有Clustered Index寻找PK_Content。
I tried this before and after adding a Descending index for the Content.IsIndexed
column, I also added Content.DocumentId
to the IX_Indexed indexes, but it made no difference.
我在为Content.IsIndexed列添加降序索引之前和之后尝试了这个,我还将Content.DocumentId添加到IX_Indexed索引,但它没有任何区别。
What am I doing wrong? I've made all the necessary indexes (and then some). The Content table has hundreds of thousands of rows, similarly for the Users table, so I can't see why the optimiser would choose a scan.
我究竟做错了什么?我已经制作了所有必要的索引(然后是一些)。 Content表有数十万行,类似于Users表,所以我看不出优化器选择扫描的原因。
2 个解决方案
#1
1
An index on such a low selectivity column (only two values 0 and 1) is always going to be ignored, see the tipping point. One option is to move it as the leftmost key in the clustered index, and make the primary key constraint on DocumentId a non-clustered index:
这种低选择性列的索引(只有两个值0和1)总是会被忽略,请参见引爆点。一种选择是将其作为聚集索引中最左侧的键移动,并使DocumentId上的主键约束成为非聚集索引:
CREATE TABLE Content (
DocumentId bigint,
CategoryId bigint,
Title nvarchar(255),
AuthorUserId bigint,
Body nvarchar(MAX),
IsIndexed bit,
constraint pk_DocumentId primary key nonclustered (DocumentId)
)
create unique clustered index cdxContent on Content (IsIndexed, DocumentId);
Another option is to create a filtered covering index:
另一种选择是创建过滤的覆盖索引:
create unique index nonIndexedContent on Content (DocumentId)
include (CategoryId, Title, AuthorUserId, Body)
where IsIndexed = 0;
This second option would duplicate a lot of content possibly. Personally, I would go with the first option.
第二个选项可能会复制很多内容。就个人而言,我会选择第一个选项。
#2
2
Add an index to Content with both the IsIndexed
field and the AuthorUserId
field, it should do a seek then. Depending on your version of SQL server, you could add an INCLUDE
statement with the fields you're using in the select to possibly gain more speed.
使用IsIndexed字段和AuthorUserId字段向Content添加索引,然后它应该执行搜索。根据您的SQL Server版本,您可以添加INCLUDE语句,其中包含您在select中使用的字段,以获得更快的速度。
IX_Indexed_AuthorUserId (Non-Unique, Non-Clustered) : IsIndexed, AuthorUserId
IX_Indexed_AuthorUserId(非唯一,非群集):IsIndexed,AuthorUserId
#1
1
An index on such a low selectivity column (only two values 0 and 1) is always going to be ignored, see the tipping point. One option is to move it as the leftmost key in the clustered index, and make the primary key constraint on DocumentId a non-clustered index:
这种低选择性列的索引(只有两个值0和1)总是会被忽略,请参见引爆点。一种选择是将其作为聚集索引中最左侧的键移动,并使DocumentId上的主键约束成为非聚集索引:
CREATE TABLE Content (
DocumentId bigint,
CategoryId bigint,
Title nvarchar(255),
AuthorUserId bigint,
Body nvarchar(MAX),
IsIndexed bit,
constraint pk_DocumentId primary key nonclustered (DocumentId)
)
create unique clustered index cdxContent on Content (IsIndexed, DocumentId);
Another option is to create a filtered covering index:
另一种选择是创建过滤的覆盖索引:
create unique index nonIndexedContent on Content (DocumentId)
include (CategoryId, Title, AuthorUserId, Body)
where IsIndexed = 0;
This second option would duplicate a lot of content possibly. Personally, I would go with the first option.
第二个选项可能会复制很多内容。就个人而言,我会选择第一个选项。
#2
2
Add an index to Content with both the IsIndexed
field and the AuthorUserId
field, it should do a seek then. Depending on your version of SQL server, you could add an INCLUDE
statement with the fields you're using in the select to possibly gain more speed.
使用IsIndexed字段和AuthorUserId字段向Content添加索引,然后它应该执行搜索。根据您的SQL Server版本,您可以添加INCLUDE语句,其中包含您在select中使用的字段,以获得更快的速度。
IX_Indexed_AuthorUserId (Non-Unique, Non-Clustered) : IsIndexed, AuthorUserId
IX_Indexed_AuthorUserId(非唯一,非群集):IsIndexed,AuthorUserId