在SQL Server 2005中分析非常大的结果集的有效方法是什么？

EDIT: I'm still waiting for more answers. Thanks!

编辑:我还在等待更多答案。谢谢!

In SQL 2000 days, I used to use temp table method where you create a temp table with new identity column and primary key then select where identity column between A and B.

在SQL 2000天,我曾经使用临时表方法,您在其中创建具有新标识列和主键的临时表,然后选择A和B之间的标识列。

When SQL 2005 came along I found out about Row_Number() and I've been using it ever since...

当SQL 2005出现时,我发现了Row_Number(),从那以后我一直在使用它...

But now, I found a serious performance issue with Row_Number(). It performs very well when you are working with not-so-gigantic result sets and sorting over an identity column. However, it performs very poorly when you are working with large result sets like over 10,000 records and sorting it over non-identity column. Row_Number() performs poorly even if you sort by an identity column if the result set is over 250,000 records. For me, it came to a point where it throws an error, "command timeout!"

但现在,我发现Row_Number()存在严重的性能问题。当您使用不那么巨大的结果集并对标识列进行排序时,它的性能非常好。但是,当您处理超过10,000条记录的大型结果集并将其排序到非标识列时,它的性能非常差。如果结果集超过250,000条记录,则即使按标识列排序,Row_Number()也表现不佳。对我来说,它突然出现错误,“命令超时!”

What do you use to do paginate a large result set on SQL 2005? Is temp table method still better in this case? I'm not sure if this method using temp table with SET ROWCOUNT will perform better... But some say there is an issue of giving wrong row number if you have multi-column primary key.

您在SQL 2005上使用什么分页大型结果集?在这种情况下,临时表方法还是更好吗?我不确定使用带有SET ROWCOUNT的临时表的这种方法是否会表现得更好......但是有人说如果你有多列主键,则会出现错误行号的问题。

In my case, I need to be able to sort the result set by a date type column... for my production web app.

就我而言,我需要能够按日期类型列对结果集进行排序...对于我的生产Web应用程序。

Let me know what you use for high-performing pagination in SQL 2005. And I'd also like to know a smart way of creating indexes. I'm suspecting choosing right primary keys and/or indexes (clustered/non-clustered) will play a big role here.

让我知道你在SQL 2005中用于高性能分页的内容。我也想知道一种创建索引的聪明方法。我怀疑选择正确的主键和/或索引(群集/非群集)将在这里发挥重要作用。

Thanks in advance.

提前致谢。

P.S. Does anyone know what * uses?

附:有谁知道*使用什么?

EDIT: Mine looks something like...

编辑:我看起来像......

SELECT postID, postTitle, postDate
FROM
   (SELECT postID, postTitle, postDate, 
         ROW_NUMBER() OVER(ORDER BY postDate DESC, postID DESC) as RowNum
    FROM MyTable
   ) as DerivedMyTable
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

postID: Int, Identity (auto-increment), Primary key

postID:Int,Identity(自动增量),主键

postDate: DateTime

EDIT: Is everyone using Row_Number()?

编辑:每个人都使用Row_Number()?

2 个解决方案

#1

Well, for your sample query ROW_COUNT should be pretty fast with thousands of rows, provided you have an index on your PostDate field. If you don't, the server needs to perform a complete clustered index scan on your PK, practically load every page, fetch your PostDate field, sort by it, determine the rows to extract for the result set and again fetch those rows. It's kind of creating a temp index over and over again (you might see an table/index spool in the plain).

好吧,对于您的示例查询,如果您在PostDate字段上有索引,则ROW_COUNT应该非常快,有数千行。如果不这样做,服务器需要在PK上执行完整的聚簇索引扫描,实际上加载每个页面,获取PostDate字段,按其排序,确定要为结果集提取的行,然后再次获取这些行。它是一种一遍又一遍地创建临时索引(你可能会在普通中看到一个表/索引假脱机)。

No wonder you get timeouts.

难怪你得到超时。

My suggestion: set an index on PostDate DESC, this is what ROW_NUMBER will go over - (ORDER BY PostDate DESC, ...)

我的建议:在PostDate DESC上设置一个索引,这是ROW_NUMBER将要经过的 - (ORDER BY PostDate DESC,...)

As for the article you are referring to - I've done pretty much paging and stuff with SQL Server 2000 in the past without ROW_COUNT and the approach used in the article is the most efficient one. It does not work in all circumstances (you need unique or almost unique values). An overview of some other methods is here.

至于你所指的那篇文章 - 我过去做了很多分页和SQL Server 2000的东西而没有ROW_COUNT,文章中使用的方法是最有效的方法。它并不适用于所有情况(您需要唯一或几乎唯一的值)。这里有一些其他方法的概述。

#2

The row_number() technique should be quick. I have seen good results for 100,000 rows.

row_number()技术应该很快。我已经看到100,000行的好结果。

Are you using row_number() similiar to the following:

您是否正在使用与以下类似的row_number():

SELECT column_list
FROM
   (SELECT column_list
         ROW_NUMBER() OVER(ORDER BY OrderByColumnName) as RowNum
    FROM MyTable m
   ) as DerivedTableName
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1

...and do you have a covering index for the column_list and/or an index on the 'OrderByColumnName' column?

...你有一个column_list的覆盖索引和/或'OrderByColumnName'列的索引吗?

#1