在SQL Server 2005中分析非常大的结果集的有效方法是什么?

时间:2021-05-13 10:18:18

EDIT: I'm still waiting for more answers. Thanks!


In SQL 2000 days, I used to use temp table method where you create a temp table with new identity column and primary key then select where identity column between A and B.

在SQL 2000天,我曾经使用临时表方法,您在其中创建具有新标识列和主键的临时表,然后选择A和B之间的标识列。

When SQL 2005 came along I found out about Row_Number() and I've been using it ever since...

当SQL 2005出现时,我发现了Row_Number(),从那以后我一直在使用它...

But now, I found a serious performance issue with Row_Number(). It performs very well when you are working with not-so-gigantic result sets and sorting over an identity column. However, it performs very poorly when you are working with large result sets like over 10,000 records and sorting it over non-identity column. Row_Number() performs poorly even if you sort by an identity column if the result set is over 250,000 records. For me, it came to a point where it throws an error, "command timeout!"


What do you use to do paginate a large result set on SQL 2005? Is temp table method still better in this case? I'm not sure if this method using temp table with SET ROWCOUNT will perform better... But some say there is an issue of giving wrong row number if you have multi-column primary key.

您在SQL 2005上使用什么分页大型结果集?在这种情况下,临时表方法还是更好吗?我不确定使用带有SET ROWCOUNT的临时表的这种方法是否会表现得更好......但是有人说如果你有多列主键,则会出现错误行号的问题。

In my case, I need to be able to sort the result set by a date type column... for my production web app.


Let me know what you use for high-performing pagination in SQL 2005. And I'd also like to know a smart way of creating indexes. I'm suspecting choosing right primary keys and/or indexes (clustered/non-clustered) will play a big role here.

让我知道你在SQL 2005中用于高性能分页的内容。我也想知道一种创建索引的聪明方法。我怀疑选择正确的主键和/或索引(群集/非群集)将在这里发挥重要作用。

Thanks in advance.


P.S. Does anyone know what * uses?


EDIT: Mine looks something like...


SELECT postID, postTitle, postDate
   (SELECT postID, postTitle, postDate, 
         ROW_NUMBER() OVER(ORDER BY postDate DESC, postID DESC) as RowNum
    FROM MyTable
   ) as DerivedMyTable
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

postID: Int, Identity (auto-increment), Primary key


postDate: DateTime

EDIT: Is everyone using Row_Number()?


2 个解决方案



Well, for your sample query ROW_COUNT should be pretty fast with thousands of rows, provided you have an index on your PostDate field. If you don't, the server needs to perform a complete clustered index scan on your PK, practically load every page, fetch your PostDate field, sort by it, determine the rows to extract for the result set and again fetch those rows. It's kind of creating a temp index over and over again (you might see an table/index spool in the plain).


No wonder you get timeouts.


My suggestion: set an index on PostDate DESC, this is what ROW_NUMBER will go over - (ORDER BY PostDate DESC, ...)

我的建议:在PostDate DESC上设置一个索引,这是ROW_NUMBER将要经过的 - (ORDER BY PostDate DESC,...)

As for the article you are referring to - I've done pretty much paging and stuff with SQL Server 2000 in the past without ROW_COUNT and the approach used in the article is the most efficient one. It does not work in all circumstances (you need unique or almost unique values). An overview of some other methods is here.

至于你所指的那篇文章 - 我过去做了很多分页和SQL Server 2000的东西而没有ROW_COUNT,文章中使用的方法是最有效的方法。它并不适用于所有情况(您需要唯一或几乎唯一的值)。这里有一些其他方法的概述。




The row_number() technique should be quick. I have seen good results for 100,000 rows.


Are you using row_number() similiar to the following:


SELECT column_list
   (SELECT column_list
         ROW_NUMBER() OVER(ORDER BY OrderByColumnName) as RowNum
    FROM MyTable m
   ) as DerivedTableName
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

...and do you have a covering index for the column_list and/or an index on the 'OrderByColumnName' column?




Well, for your sample query ROW_COUNT should be pretty fast with thousands of rows, provided you have an index on your PostDate field. If you don't, the server needs to perform a complete clustered index scan on your PK, practically load every page, fetch your PostDate field, sort by it, determine the rows to extract for the result set and again fetch those rows. It's kind of creating a temp index over and over again (you might see an table/index spool in the plain).


No wonder you get timeouts.


My suggestion: set an index on PostDate DESC, this is what ROW_NUMBER will go over - (ORDER BY PostDate DESC, ...)

我的建议:在PostDate DESC上设置一个索引,这是ROW_NUMBER将要经过的 - (ORDER BY PostDate DESC,...)

As for the article you are referring to - I've done pretty much paging and stuff with SQL Server 2000 in the past without ROW_COUNT and the approach used in the article is the most efficient one. It does not work in all circumstances (you need unique or almost unique values). An overview of some other methods is here.

至于你所指的那篇文章 - 我过去做了很多分页和SQL Server 2000的东西而没有ROW_COUNT,文章中使用的方法是最有效的方法。它并不适用于所有情况(您需要唯一或几乎唯一的值)。这里有一些其他方法的概述。




The row_number() technique should be quick. I have seen good results for 100,000 rows.


Are you using row_number() similiar to the following:


SELECT column_list
   (SELECT column_list
         ROW_NUMBER() OVER(ORDER BY OrderByColumnName) as RowNum
    FROM MyTable m
   ) as DerivedTableName
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

...and do you have a covering index for the column_list and/or an index on the 'OrderByColumnName' column?
