Here's a table:
这里有一个表:
CREATE TABLE Meetings
(
ID int PRIMRY KEY IDENTITY(1,1)
StartDate DateTime NOT NULL,
EndDate DateTime NULL,
Field1 varchar(50),
Field2 varchar(50),
Field3 varchar(50),
Field4 varchar(50)
)
There's several thousand rows. The data ranges can be varying sizes (from a couple days up to 50 years).
有几千行。数据范围可以是不同的大小(从几天到50年)。
Here's a query:
这里有一个查询:
DECLARE @ApplicableDate DateTime
SELECT ID, StartDate, EndDate, Field1, Field2, Field3, Field4
FROM Meetings
WHERE StartDate <= @ApplicableDate AND
(EndDate is null || @ApplicableDate <= EndDate)
Since the date ranges can be large, a large portion of the table might be returned (20%-50% of the rows).
由于日期范围可以很大,因此可能会返回表的很大一部分(20%-50%的行)。
The query represents the rows I want in a simple way, but the performance is pretty bad. It does a clustered index scan, no matter what indexes I add. I've tried:
查询以一种简单的方式表示我想要的行,但是性能很差。它进行集群索引扫描,无论我添加什么索引。
- StartDate
- StartDate可以
- StartDate, EndDate
- StartDate可以,EndDate
How can I improve the performance of this query?
如何改进此查询的性能?
I've reviewed the answers for this question and this one too. Those solutions aren't helpful in my situation - I don't really want to muck with the business' data by creating a separate table of Dates to turn the query into an equality query (what happens when end date is modified, or null?), or by morphing the data to fit in a spatial index.
我已经复习了这个问题和这个问题的答案。这些解决方案不帮助我的情况,我不想把系统与业务的数据创建一个单独的表的日期将查询转化为一个平等查询(结束日期被修改时,会发生什么还是零?),或通过变形数据适合空间索引。
Still, I'm open to possible modifications to the data structure (particular if they do not add rows, and do not use strange data types)..
不过,我还是愿意对数据结构进行修改(特别是如果它们不添加行,也不使用奇怪的数据类型)。
2 个解决方案
#1
2
If the query returns 20%-50% of the records, then a scan is many times the best option. If you have an index, you always have to find the data in the index,which then contains a record address in the table, and you then have to get the page containing this record from disk, risking that adjacent records in the index are spread all over the disk.
如果查询返回20%-50%的记录,那么扫描是最好的选择。如果您有一个索引,那么您总是必须在索引中找到数据,索引中包含表中的一个记录地址,然后您必须从磁盘中获得包含该记录的页面,否则索引中的相邻记录将分散到整个磁盘。
If you really need that many records and performance is bad, then maybe check the following:
如果你真的需要那么多的记录,而且性能很差,那么可以检查以下内容:
- Is the disk speed an issue?
- 磁盘速度有问题吗?
- Is it the network bandwidth?
- 是网络带宽吗?
- Are you restricted in RAM/Cache?
- 您在RAM/缓存中受到限制吗?
#2
3
I'm assuming you're on SQL Server for my syntax.
我假设您在SQL Server上查看我的语法。
Make your primary key on ID a non-clustered index.
将ID上的主键设置为非聚集索引。
ID int PRIMARY KEY NONCLUSTERED IDENTITY(1,1),
Create a clustered in index on the StartDate column.
在StartDate列上创建聚集索引。
CREATE CLUSTERED INDEX ix_Meetings_StartDate
ON Meetings (StartDate)
Try your query as is. Even though the data is probably stored similarly to what you had with the clustered PK, now the query engine will know in advance that the data is clustered by the start date.
按原样尝试查询。尽管数据的存储方式可能与集群PK的存储方式类似,但现在查询引擎将提前知道数据是在开始日期之前被集群的。
#1
2
If the query returns 20%-50% of the records, then a scan is many times the best option. If you have an index, you always have to find the data in the index,which then contains a record address in the table, and you then have to get the page containing this record from disk, risking that adjacent records in the index are spread all over the disk.
如果查询返回20%-50%的记录,那么扫描是最好的选择。如果您有一个索引,那么您总是必须在索引中找到数据,索引中包含表中的一个记录地址,然后您必须从磁盘中获得包含该记录的页面,否则索引中的相邻记录将分散到整个磁盘。
If you really need that many records and performance is bad, then maybe check the following:
如果你真的需要那么多的记录,而且性能很差,那么可以检查以下内容:
- Is the disk speed an issue?
- 磁盘速度有问题吗?
- Is it the network bandwidth?
- 是网络带宽吗?
- Are you restricted in RAM/Cache?
- 您在RAM/缓存中受到限制吗?
#2
3
I'm assuming you're on SQL Server for my syntax.
我假设您在SQL Server上查看我的语法。
Make your primary key on ID a non-clustered index.
将ID上的主键设置为非聚集索引。
ID int PRIMARY KEY NONCLUSTERED IDENTITY(1,1),
Create a clustered in index on the StartDate column.
在StartDate列上创建聚集索引。
CREATE CLUSTERED INDEX ix_Meetings_StartDate
ON Meetings (StartDate)
Try your query as is. Even though the data is probably stored similarly to what you had with the clustered PK, now the query engine will know in advance that the data is clustered by the start date.
按原样尝试查询。尽管数据的存储方式可能与集群PK的存储方式类似,但现在查询引擎将提前知道数据是在开始日期之前被集群的。