I have a table of a little over 1 billion rows of time-series data with fantastic insert performance but (sometimes) awful select performance.
我有一张超过10亿行时间序列数据的表格,具有出色的插入性能,但(有时)可怕的选择性能。
Table tblTrendDetails
(PK is ordered as shown):
表tblTrendDetails(PK如图所示排序):
PK TrendTime datetime
PK CavityId int
PK TrendValueId int
TrendValue real
The table is continuously pulling in new data and purging old data, so insert and delete performance needs to remain snappy.
该表不断提取新数据并清除旧数据,因此插入和删除性能需要保持活泼。
When executing a query such as the following, performance is poor (30 sec):
执行如下查询时,性能很差(30秒):
SELECT *
FROM tblTrendDetails
WHERE TrendTime BETWEEN @inMinTime AND @inMaxTime
AND CavityId = @inCavityId
AND TrendValueId = @inTrendId
If I execute the same query again (with similar times, but any @inCavityId
or @inTrendId
), performance is very good (1 sec). Performance counters show that disk access is the culprit the first time the query is run.
如果我再次执行相同的查询(具有相似的时间,但任何@inCavityId或@inTrendId),性能非常好(1秒)。性能计数器显示磁盘访问是第一次运行查询时的罪魁祸首。
Any recommendations regarding how to improve performance without (significantly) adversely affecting the insert or delete performance? Any suggestions (including completely changing the underlying database) are welcome.
有关如何在不显着影响插入或删除性能的情况下提高性能的任何建议?任何建议(包括完全更改底层数据库)都是受欢迎的。
1 个解决方案
#1
6
The fact that subsequent queries of the same or similar data run much faster is probably due to SQL Server caching your data. That said, is it possible to speed this initial query up?
事实上,相同或类似数据的后续查询运行得更快可能是由于SQL Server缓存了您的数据。也就是说,是否可以加快初始查询速度?
Verify the query plan:
验证查询计划:
My guess is that your query should result in an Index Seek rather than an Index Scan (or worse, a Table Scan). Please verify this using SET SHOWPLAN_TEXT ON;
or a similar feature. Using between
and =
as your query does should really take advantage of the clustered index, though that's debatable.
我的猜测是你的查询应该导致索引搜索而不是索引扫描(或者更糟糕的是,表扫描)。请使用SET SHOWPLAN_TEXT ON进行验证;或类似的功能。使用from和=作为你的查询确实应该利用聚集索引,尽管这是有争议的。
Index Fragmentation:
索引碎片:
It is possible that your clustered index (the primary key in this case) is quite fragmented after all of those inserts and deletes. I would probably check this with DBCC SHOWCONTIG (tblTrendDetails)
.
在所有这些插入和删除之后,您的聚簇索引(在这种情况下是主键)可能是非常碎片化的。我可能会用DBCC SHOWCONTIG(tblTrendDetails)检查这个。
You can defrag the table's indexes with DBCC INDEXDEFRAG (MyDatabase, tblTrendDetails)
. This may take some time, but will allow the table to remain accessible, and you can stop the operation without any nasty side-effects.
您可以使用DBCC INDEXDEFRAG(MyDatabase,tblTrendDetails)对表的索引进行碎片整理。这可能需要一些时间,但可以让表格保持可访问状态,您可以停止操作而不会产生任何令人讨厌的副作用。
You might have to go further and use DBCC DBREINDEX (tblTrendDetails)
. This is an offline operation, though, so you should only do this when the table does not need to be accessed.
您可能需要更进一步并使用DBCC DBREINDEX(tblTrendDetails)。但这是一个脱机操作,所以只有在不需要访问表时才应该这样做。
There are some differences described here: Microsoft SQL Server 2000 Index Defragmentation Best Practices.
此处描述了一些差异:Microsoft SQL Server 2000索引碎片整理最佳实践。
Be aware that your transaction log can grow quite a bit from defragging a large table, and it can take a long time.
请注意,您的事务日志可能会因对大型表进行碎片整理而增长很多,而且可能需要很长时间。
Partitioned Views:
分区视图:
If these do not remedy the situation (or fragmentation is not a problem), you may even wish to look to partitioned views, in which you create a bunch of underlying base tables for various ranges of records, then union them all up in a view (replacing your original table).
如果这些不能解决问题(或碎片不是问题),您甚至可能希望查看分区视图,在其中为各种记录范围创建一组基础基表,然后在视图中将它们全部联合起来(替换原来的表格)。
Better Stuff:
更好的东西:
If performance of these selects is a real business need, you may be able to make the case for better hardware: faster drives, more memory, etc. If your drives are twice as fast, then this query will run in half the time, yeah? Also, this may not be workable for you, but I've simply found newer versions of SQL Server to truly be faster with more options and better to maintain. I'm glad to have moved most of my company's data to 2008R2. But I digress...
如果这些选择的性能是真正的业务需求,您可以为更好的硬件做好准备:更快的驱动器,更多的内存等。如果您的驱动器速度是原来的两倍,那么这个查询将在一半的时间内运行,是的?此外,这对您来说可能不太适用,但我只是发现更新版本的SQL Server可以更快地获得更多选项并且更好地维护。我很高兴将我公司的大部分数据转移到2008R2。但我离题了......
#1
6
The fact that subsequent queries of the same or similar data run much faster is probably due to SQL Server caching your data. That said, is it possible to speed this initial query up?
事实上,相同或类似数据的后续查询运行得更快可能是由于SQL Server缓存了您的数据。也就是说,是否可以加快初始查询速度?
Verify the query plan:
验证查询计划:
My guess is that your query should result in an Index Seek rather than an Index Scan (or worse, a Table Scan). Please verify this using SET SHOWPLAN_TEXT ON;
or a similar feature. Using between
and =
as your query does should really take advantage of the clustered index, though that's debatable.
我的猜测是你的查询应该导致索引搜索而不是索引扫描(或者更糟糕的是,表扫描)。请使用SET SHOWPLAN_TEXT ON进行验证;或类似的功能。使用from和=作为你的查询确实应该利用聚集索引,尽管这是有争议的。
Index Fragmentation:
索引碎片:
It is possible that your clustered index (the primary key in this case) is quite fragmented after all of those inserts and deletes. I would probably check this with DBCC SHOWCONTIG (tblTrendDetails)
.
在所有这些插入和删除之后,您的聚簇索引(在这种情况下是主键)可能是非常碎片化的。我可能会用DBCC SHOWCONTIG(tblTrendDetails)检查这个。
You can defrag the table's indexes with DBCC INDEXDEFRAG (MyDatabase, tblTrendDetails)
. This may take some time, but will allow the table to remain accessible, and you can stop the operation without any nasty side-effects.
您可以使用DBCC INDEXDEFRAG(MyDatabase,tblTrendDetails)对表的索引进行碎片整理。这可能需要一些时间,但可以让表格保持可访问状态,您可以停止操作而不会产生任何令人讨厌的副作用。
You might have to go further and use DBCC DBREINDEX (tblTrendDetails)
. This is an offline operation, though, so you should only do this when the table does not need to be accessed.
您可能需要更进一步并使用DBCC DBREINDEX(tblTrendDetails)。但这是一个脱机操作,所以只有在不需要访问表时才应该这样做。
There are some differences described here: Microsoft SQL Server 2000 Index Defragmentation Best Practices.
此处描述了一些差异:Microsoft SQL Server 2000索引碎片整理最佳实践。
Be aware that your transaction log can grow quite a bit from defragging a large table, and it can take a long time.
请注意,您的事务日志可能会因对大型表进行碎片整理而增长很多,而且可能需要很长时间。
Partitioned Views:
分区视图:
If these do not remedy the situation (or fragmentation is not a problem), you may even wish to look to partitioned views, in which you create a bunch of underlying base tables for various ranges of records, then union them all up in a view (replacing your original table).
如果这些不能解决问题(或碎片不是问题),您甚至可能希望查看分区视图,在其中为各种记录范围创建一组基础基表,然后在视图中将它们全部联合起来(替换原来的表格)。
Better Stuff:
更好的东西:
If performance of these selects is a real business need, you may be able to make the case for better hardware: faster drives, more memory, etc. If your drives are twice as fast, then this query will run in half the time, yeah? Also, this may not be workable for you, but I've simply found newer versions of SQL Server to truly be faster with more options and better to maintain. I'm glad to have moved most of my company's data to 2008R2. But I digress...
如果这些选择的性能是真正的业务需求,您可以为更好的硬件做好准备:更快的驱动器,更多的内存等。如果您的驱动器速度是原来的两倍,那么这个查询将在一半的时间内运行,是的?此外,这对您来说可能不太适用,但我只是发现更新版本的SQL Server可以更快地获得更多选项并且更好地维护。我很高兴将我公司的大部分数据转移到2008R2。但我离题了......