I have an expensive query using the row_number over() functionality in SQL Server 2005. I return only a sub list of those records as the query is paginated. However, I would like to also return the total number of records, not just the paginated subset. Running the query effectively twice to get the count is out of the question.
我在SQL Server 2005中使用row_number over()功能进行了一个开销很大的查询。当查询被分页时,我只返回这些记录的子列表。但是,我还想返回记录的总数,而不仅仅是分页子集。有效地两次运行查询以获得计数是不可能的。
Selecting count(*) is also out of the question as the performance is absolutely terrible when I've tried this.
选择count(*)也是不可能的,因为当我尝试这个时,性能非常糟糕。
What I'd really love is @@ROW_NUMBERROWCOUNT :-)
我真正喜欢的是@@ROW_NUMBERROWCOUNT:-)
4 个解决方案
#1
36
Check out the COUNT(*) aggregate when used with OVER(PARTITON BY..), like so:
当与OVER(PARTITON BY.)一起使用时,请查看COUNT(*)聚合,如下所示:
SELECT
ROW_NUMBER() OVER(ORDER BY object_id, column_id) as RowNum
, COUNT(*) OVER(PARTITION BY 1) as TotalRows
, *
FROM master.sys.columns
This is IMHO the best way to do it without having to do two queries.
这是不需要执行两个查询的最佳方式。
#2
36
Over the years a pile of developer sweat has gone into efficiently paging result sets. Yet, there is no one answer--it depends on your use case. Part of the use case is getting your page efficiently, part is figuring out how many rows are in a complete result set. So sorry if i stray a little into paging, but the two are pretty tightly coupled in my mind.
多年来,一堆开发人员的汗水已经进入了高效的分页结果集。然而,没有一个答案——这取决于你的用例。用例的一部分是有效地获取页面,一部分是计算一个完整的结果集中有多少行。
There are a lot of strategies, most of which are bad if you have any sort of data volume & don't fit the use case. While this isn't a complete list, following are some of the options.....
有很多策略,其中大部分是不好的,如果你有任何类型的数据量&不适合用例。虽然这不是一个完整的列表,以下是一些选择……
Run Separate Count(*)
- run a separate query that does a simple "select count(*) from MyTable"
- 运行一个单独的查询,执行一个简单的“从MyTable中选择count(*)”
- simple and easy for a small table
- 一张小桌子既简单又简单
- good on an unfiltered large table that is either narrow or has a compact non-clustered index you can use
- 适用于一个未过滤的大表,该表不是窄的,就是有一个紧凑的非聚集索引
- breaks down when you have a complicated
WHERE/JOIN
criteria because running theWHERE/JOIN
twice is expensive. - 当您有一个复杂的WHERE/JOIN标准时就会崩溃,因为运行WHERE/JOIN两次的开销很大。
- breaks down on a wide index because the number of reads goes up.
- 因为读取的数量增加,所以在一个大的索引上被分解。
Combine ROW_Number() OVER()
and COUNT(1) OVER(PARTITION By 1)
- This was suggested by @RBarryYoung. It has the benefit of being simple to implement and very flexible.
- 这是@RBarryYoung提出的。它的优点是易于实现,并且非常灵活。
- The down side is that there are a lot of reasons this can become extremely expensive quickly.
- 不利的一面是,有很多原因可以使它迅速变得非常昂贵。
- For example, in a DB i'm currently working there is a Media table with about 6000 rows. It's not particularly wide, has a integer clustered PK and, as well as a compact unique index. Yet, a simple
COUNT(*) OVER(PARTITION BY 1) as TotalRows
results in ~12,000 reads. Compare that to a simpleSELECT COUNT(*) FROM Media
-- 12 reads. Wowzers. - 例如,在一个DB中,我正在工作的是一个有大约6000行的媒体表。它不是特别宽,有一个整数集群PK,以及一个紧凑的唯一索引。然而,一个简单的COUNT(*)(除以1)作为totalrow的结果是大约12000次读取。将其与来自媒体的一个简单的SELECT COUNT(*)进行比较——12读取。一。
UPDATE -- the reads issue I mentioned is a bit of red-herring. It turns out, that with windowed functions the unit used to measure reads is kind of mixed. The net result is what appears to be massive numbers of reads. You can see more on the issue here : Why are logical reads for windowed aggregate functions so high?
更新——我提到的阅读问题有点像红鲱鱼。结果表明,对于窗口函数,用来测量读取的单元有点复杂。最终的结果似乎是大量的阅读。您可以在这里看到更多关于这个问题的内容:为什么窗口聚合函数的逻辑读取如此之高?
Temp Tables / Table Variables
- There are lots of strategies that take a result set and insert relevant keys or segments of results into temp tables / table variables.
- 有很多策略使用结果集并将相关的键或结果片段插入到临时表/表变量中。
- For small/medium sized result sets this can provide great results.
- 对于中小型结果集,这可以提供很好的结果。
- This type of strategy works across almost any platform/version of SQL.
- 这种策略适用于几乎所有SQL平台/版本。
- Operating on a result set multiple times (quite often a requirement) is also easy.
- 多次操作结果集(通常是一个需求)也很容易。
- The down side is when working with large results sets ... inserting a few million rows into a temp table has a cost.
- 缺点是当处理大型结果集时……向临时表中插入数百万行是有代价的。
- Compounding the issue, in a high volume system pressure on TempDB can be quite a factor, and temp tables are effectively working in TempDB.
- 使问题更加复杂的是,在TempDB上的高容量系统压力可能是一个重要因素,而temp表在TempDB中有效地工作。
Gaussian Sum / Double Row Number
- This idea relies on subset of something the mathematician Gauss figured out (how to sum a series of numbers). The subset is how to get row count from any point in the table.
- 这个想法依赖于数学家高斯计算出的子集(如何对一系列数字求和)。子集是如何从表中的任何点获取行数。
- From a series of numbers (
Row_Number()
) the row count for 1 to N is(N + 1) - 1
. More explanation in the links. - 从一系列数字(Row_Number())中,从1到N的行数为(N + 1) - 1。更多解释在链接。
- The formula seems like it would net out to just N, but the if you stick with the formula an interesting things happens, you can figure out row count from a page in the middle of the table.
- 这个公式看起来是净的N,但是如果你坚持这个公式会发生一些有趣的事情,你可以从表格中间的一页算出行数。
- The net result is you do
ROW_Number() OVER(Order by ID)
andROW_Number() OVER(Order by ID DESC)
then sum the two numbers and subtract 1. - 最终的结果是对ROW_Number() /(按ID排序)和ROW_Number() /(按ID排序)求和,然后将两个数字相加,减去1。
- Using my Media table as an example my reads dropped from 12,000 to about 75.
- 以我的媒体表为例,我的阅读量从12000下降到75。
- In a larger page you've ended up repeating data many many times, but the offset in reads may be worth it.
- 在较大的页面中,您已经多次重复数据,但是读中的偏移量可能是值得的。
- I haven't tested this on too many scenarios, so it may fall apart in other scenarios.
- 我还没有在很多场景中测试过它,所以它可能会在其他场景中崩溃。
Top (@n) / SET ROWCOUNT
- These aren't specific strategies per-se, but are optimizations based on what we know about the query optimizer.
- 这些策略本身并不是特定的策略,而是基于我们对查询优化器的了解进行的优化。
- Creatively using Top(@n) [top can be a variable in SQL 2008] or SET ROWCOUNT can reduce your working set ...even if you're pulling a middle page of a result set you can still narrow the result
- 创造性地使用Top(@n) [Top可以是SQL 2008中的一个变量]或者设置ROWCOUNT可以减少工作集…即使您正在拖拽结果集的中间页面,您仍然可以缩小结果的范围
- These ideas work because of query optimizer behavior ...a service pack/hotfix can change the behavior (although probably not).
- 这些想法之所以有效,是因为查询优化器的行为……服务包/热修复程序可以改变行为(尽管可能不会)。
- In certian instances SET ROWCOUNT can be a bit in accurate
- 在证明实例中,行数可以是精确的。
- This strategy doesn't account for getting the full row count, just makes paging more efficient
- 这种策略不考虑获取全行计数,只会提高分页效率
So what's a developer to do?
Read my good man, read. Here are some articles that I've leaned on...
读我的好男人,读。以下是我的一些文章……
- A More Efficient Method for Paging Through Large Result Sets
- 一种更有效的分页方法,可以通过大型结果集进行分页。
- Optimising Server-Side Paging - Part I
- 优化服务器端分页-第1部分
- Optimising Server-Side Paging - Part II
- 优化服务器端分页。第2部分
- Explaination of the Gaussian Sum
- 高斯和的解释
- Returning Ranked Results with Microsoft SQL Server 2005
- 使用Microsoft SQL Server 2005返回排名结果
- ROW_NUMBER() OVER Not Fast Enough With Large Result Set
- ROW_NUMBER()的速度不够快,结果集大。
- Retrieving the First N Records from a SQL Query
- 从SQL查询中检索前N条记录
- Server Side Paging using SQL Server 2005
- 使用SQL Server 2005进行服务器端分页
- Why are logical reads for windowed aggregate functions so high?
- 为什么窗口聚合函数的逻辑读取这么高?
Hope that helps.
希望有帮助。
#3
4
If count(*) is slow you really need to address that issue first by carefully examining your indexes and making sure your statistics are up to date.
如果count(*)很慢,您需要首先仔细检查索引并确保统计数据是最新的,从而解决这个问题。
In my experience, there is nothing better than doing two separate queries, one to get the data page, and one to get the total count. Using a temporary table in order to get total counts is a losing strategy as your number of rows increases. E.g., the cost of inserting 10,000,000 million rows into a temp table simply to count them is obviously going to be excessive.
根据我的经验,没有什么比执行两个单独的查询更好的了,一个查询获取数据页面,另一个查询获取总数。随着行数的增加,使用临时表来获取总计数是一种失败的策略。例:将1000万行插入临时表中进行计数的成本显然是过高的。
#4
0
I do this by putting the whole resultset with the row_number into a temp table, then use the @@rowcount from that and use the query on that to return the page of data I need.
为此,我将包含row_number的整个resultset放到一个临时表中,然后使用其中的@@rowcount并使用查询返回所需的数据页。
#1
36
Check out the COUNT(*) aggregate when used with OVER(PARTITON BY..), like so:
当与OVER(PARTITON BY.)一起使用时,请查看COUNT(*)聚合,如下所示:
SELECT
ROW_NUMBER() OVER(ORDER BY object_id, column_id) as RowNum
, COUNT(*) OVER(PARTITION BY 1) as TotalRows
, *
FROM master.sys.columns
This is IMHO the best way to do it without having to do two queries.
这是不需要执行两个查询的最佳方式。
#2
36
Over the years a pile of developer sweat has gone into efficiently paging result sets. Yet, there is no one answer--it depends on your use case. Part of the use case is getting your page efficiently, part is figuring out how many rows are in a complete result set. So sorry if i stray a little into paging, but the two are pretty tightly coupled in my mind.
多年来,一堆开发人员的汗水已经进入了高效的分页结果集。然而,没有一个答案——这取决于你的用例。用例的一部分是有效地获取页面,一部分是计算一个完整的结果集中有多少行。
There are a lot of strategies, most of which are bad if you have any sort of data volume & don't fit the use case. While this isn't a complete list, following are some of the options.....
有很多策略,其中大部分是不好的,如果你有任何类型的数据量&不适合用例。虽然这不是一个完整的列表,以下是一些选择……
Run Separate Count(*)
- run a separate query that does a simple "select count(*) from MyTable"
- 运行一个单独的查询,执行一个简单的“从MyTable中选择count(*)”
- simple and easy for a small table
- 一张小桌子既简单又简单
- good on an unfiltered large table that is either narrow or has a compact non-clustered index you can use
- 适用于一个未过滤的大表,该表不是窄的,就是有一个紧凑的非聚集索引
- breaks down when you have a complicated
WHERE/JOIN
criteria because running theWHERE/JOIN
twice is expensive. - 当您有一个复杂的WHERE/JOIN标准时就会崩溃,因为运行WHERE/JOIN两次的开销很大。
- breaks down on a wide index because the number of reads goes up.
- 因为读取的数量增加,所以在一个大的索引上被分解。
Combine ROW_Number() OVER()
and COUNT(1) OVER(PARTITION By 1)
- This was suggested by @RBarryYoung. It has the benefit of being simple to implement and very flexible.
- 这是@RBarryYoung提出的。它的优点是易于实现,并且非常灵活。
- The down side is that there are a lot of reasons this can become extremely expensive quickly.
- 不利的一面是,有很多原因可以使它迅速变得非常昂贵。
- For example, in a DB i'm currently working there is a Media table with about 6000 rows. It's not particularly wide, has a integer clustered PK and, as well as a compact unique index. Yet, a simple
COUNT(*) OVER(PARTITION BY 1) as TotalRows
results in ~12,000 reads. Compare that to a simpleSELECT COUNT(*) FROM Media
-- 12 reads. Wowzers. - 例如,在一个DB中,我正在工作的是一个有大约6000行的媒体表。它不是特别宽,有一个整数集群PK,以及一个紧凑的唯一索引。然而,一个简单的COUNT(*)(除以1)作为totalrow的结果是大约12000次读取。将其与来自媒体的一个简单的SELECT COUNT(*)进行比较——12读取。一。
UPDATE -- the reads issue I mentioned is a bit of red-herring. It turns out, that with windowed functions the unit used to measure reads is kind of mixed. The net result is what appears to be massive numbers of reads. You can see more on the issue here : Why are logical reads for windowed aggregate functions so high?
更新——我提到的阅读问题有点像红鲱鱼。结果表明,对于窗口函数,用来测量读取的单元有点复杂。最终的结果似乎是大量的阅读。您可以在这里看到更多关于这个问题的内容:为什么窗口聚合函数的逻辑读取如此之高?
Temp Tables / Table Variables
- There are lots of strategies that take a result set and insert relevant keys or segments of results into temp tables / table variables.
- 有很多策略使用结果集并将相关的键或结果片段插入到临时表/表变量中。
- For small/medium sized result sets this can provide great results.
- 对于中小型结果集,这可以提供很好的结果。
- This type of strategy works across almost any platform/version of SQL.
- 这种策略适用于几乎所有SQL平台/版本。
- Operating on a result set multiple times (quite often a requirement) is also easy.
- 多次操作结果集(通常是一个需求)也很容易。
- The down side is when working with large results sets ... inserting a few million rows into a temp table has a cost.
- 缺点是当处理大型结果集时……向临时表中插入数百万行是有代价的。
- Compounding the issue, in a high volume system pressure on TempDB can be quite a factor, and temp tables are effectively working in TempDB.
- 使问题更加复杂的是,在TempDB上的高容量系统压力可能是一个重要因素,而temp表在TempDB中有效地工作。
Gaussian Sum / Double Row Number
- This idea relies on subset of something the mathematician Gauss figured out (how to sum a series of numbers). The subset is how to get row count from any point in the table.
- 这个想法依赖于数学家高斯计算出的子集(如何对一系列数字求和)。子集是如何从表中的任何点获取行数。
- From a series of numbers (
Row_Number()
) the row count for 1 to N is(N + 1) - 1
. More explanation in the links. - 从一系列数字(Row_Number())中,从1到N的行数为(N + 1) - 1。更多解释在链接。
- The formula seems like it would net out to just N, but the if you stick with the formula an interesting things happens, you can figure out row count from a page in the middle of the table.
- 这个公式看起来是净的N,但是如果你坚持这个公式会发生一些有趣的事情,你可以从表格中间的一页算出行数。
- The net result is you do
ROW_Number() OVER(Order by ID)
andROW_Number() OVER(Order by ID DESC)
then sum the two numbers and subtract 1. - 最终的结果是对ROW_Number() /(按ID排序)和ROW_Number() /(按ID排序)求和,然后将两个数字相加,减去1。
- Using my Media table as an example my reads dropped from 12,000 to about 75.
- 以我的媒体表为例,我的阅读量从12000下降到75。
- In a larger page you've ended up repeating data many many times, but the offset in reads may be worth it.
- 在较大的页面中,您已经多次重复数据,但是读中的偏移量可能是值得的。
- I haven't tested this on too many scenarios, so it may fall apart in other scenarios.
- 我还没有在很多场景中测试过它,所以它可能会在其他场景中崩溃。
Top (@n) / SET ROWCOUNT
- These aren't specific strategies per-se, but are optimizations based on what we know about the query optimizer.
- 这些策略本身并不是特定的策略,而是基于我们对查询优化器的了解进行的优化。
- Creatively using Top(@n) [top can be a variable in SQL 2008] or SET ROWCOUNT can reduce your working set ...even if you're pulling a middle page of a result set you can still narrow the result
- 创造性地使用Top(@n) [Top可以是SQL 2008中的一个变量]或者设置ROWCOUNT可以减少工作集…即使您正在拖拽结果集的中间页面,您仍然可以缩小结果的范围
- These ideas work because of query optimizer behavior ...a service pack/hotfix can change the behavior (although probably not).
- 这些想法之所以有效,是因为查询优化器的行为……服务包/热修复程序可以改变行为(尽管可能不会)。
- In certian instances SET ROWCOUNT can be a bit in accurate
- 在证明实例中,行数可以是精确的。
- This strategy doesn't account for getting the full row count, just makes paging more efficient
- 这种策略不考虑获取全行计数,只会提高分页效率
So what's a developer to do?
Read my good man, read. Here are some articles that I've leaned on...
读我的好男人,读。以下是我的一些文章……
- A More Efficient Method for Paging Through Large Result Sets
- 一种更有效的分页方法,可以通过大型结果集进行分页。
- Optimising Server-Side Paging - Part I
- 优化服务器端分页-第1部分
- Optimising Server-Side Paging - Part II
- 优化服务器端分页。第2部分
- Explaination of the Gaussian Sum
- 高斯和的解释
- Returning Ranked Results with Microsoft SQL Server 2005
- 使用Microsoft SQL Server 2005返回排名结果
- ROW_NUMBER() OVER Not Fast Enough With Large Result Set
- ROW_NUMBER()的速度不够快,结果集大。
- Retrieving the First N Records from a SQL Query
- 从SQL查询中检索前N条记录
- Server Side Paging using SQL Server 2005
- 使用SQL Server 2005进行服务器端分页
- Why are logical reads for windowed aggregate functions so high?
- 为什么窗口聚合函数的逻辑读取这么高?
Hope that helps.
希望有帮助。
#3
4
If count(*) is slow you really need to address that issue first by carefully examining your indexes and making sure your statistics are up to date.
如果count(*)很慢,您需要首先仔细检查索引并确保统计数据是最新的,从而解决这个问题。
In my experience, there is nothing better than doing two separate queries, one to get the data page, and one to get the total count. Using a temporary table in order to get total counts is a losing strategy as your number of rows increases. E.g., the cost of inserting 10,000,000 million rows into a temp table simply to count them is obviously going to be excessive.
根据我的经验,没有什么比执行两个单独的查询更好的了,一个查询获取数据页面,另一个查询获取总数。随着行数的增加,使用临时表来获取总计数是一种失败的策略。例:将1000万行插入临时表中进行计数的成本显然是过高的。
#4
0
I do this by putting the whole resultset with the row_number into a temp table, then use the @@rowcount from that and use the query on that to return the page of data I need.
为此,我将包含row_number的整个resultset放到一个临时表中,然后使用其中的@@rowcount并使用查询返回所需的数据页。