使用SQL Server 2005检索每个组的TOP X成员

时间:2023-02-05 01:43:03

I've seen this question asked a couple of times, and I've written my own query, but it's quite slow, and I would be extremely grateful if someone could offer advice on how to speed it up.


In a simplified scenario, I have the following two tables:


- GroupID (primary key)

组 - GroupID(主键)

- MemberID (primary key)
- GroupID (foreign key)

成员 - MemberID(主键) - GroupID(外键)

Let's say, for each GroupID in Group, I want to find the top 2 MemberID values from Member that have that GroupID.


Here's my current query that works, but is painfully slow:


SELECT M.MemberID, M.GroupID
FROM   Member AS M
WHERE  M.MemberID in 
        (Select top 2 Member.MemberID
         FROM Member
         Where Member.GroupID = M.GroupID
         ORDER BY Member.MemberID)

Say Group has the following rows

Say Group具有以下行GroupID 1 2 3

and Member has the following rows
MemberID, GroupID
1, 1
2, 2
3, 3
4, 1
5, 2
6, 3
7, 1
8, 2
9, 3

和成员具有以下行MemberID,GroupID 1,1 2,2,3,4,1,5,2,6,7 7,1 8,2 3,3

Then my query should return:
MemberID GroupID
1, 1
2, 2
3, 3
4, 1
5, 2
6, 3

然后我的查询应该返回:MemberID GroupID 1,1 1,2 2,3 3 3,4,5 3,3

3 个解决方案


I believe the dependent nested query might be really hard for the db engine to optimize well (though @John Saunders' request to see the execution plan is well founded, and seeing what indices you have would not hurt either;-).

我相信依赖的嵌套查询可能对数据库引擎很难很好地进行优化(尽管@John Saunders要求看到执行计划是有根据的,并且看到你所拥有的索引也不会受到伤害;-)。

But, a more natural approach to such ranking related problems in SQL Server 2005 and 2008 (and other SQL engines, since the feature is in recent ANSI standards) is ranking functions -- RANK, DENSE_RANK, or ROW_NUMBER... they're all equivalent when you're ranking by a unique field, anyway;-). Even apart from optimization, they're easier to read once you're used to them (and more powerful when your problems are harder than this one), especially with the help of that other neat new-ish construct, the WITH clause...:

但是,SQL Server 2005和2008(以及其他SQL引擎,因为该功能是最近的ANSI标准)中的这种排名相关问题的更自然的方法是排名函数 - RANK,DENSE_RANK或ROW_NUMBER ......它们都是无论如何,当你按一个独特的领域排名时,相当于;-)。即使除了优化之外,一旦你习惯它们就会更容易阅读(当你的问题比这个更难时更强大),特别是在其他整洁的新构造,WITH子句的帮助下。 。:

WITH OrderedMembers AS
    SELECT MemberId, GroupId,
    FROM Member 
SELECT MemberId, GroupId
FROM OrderedMembers 
WHERE RowNumber <= 2
ORDER BY MemberId;


You could probably use the RANK function to do this, but it might not be any faster. That's because you don't know why your query is slow.


Why not go find out? Look at the execution plan. See if there are table scans going on? Run the Query Optimizer and see what it has to say.


There's no reason to optimize until you know what's wrong.



Thank you John and Alex for your replies. I'm pretty fresh out of school and very new to SQLServer, so the execution plan option was brand new to me. It reported that 96% of the query cost was being consumed by a Clustered Index Scan which I'm assuming was a result of the nested query. Truth be told, I'm not quite sure what the next step would have been to optimize.


Alex, the query you provided ran in the blink of an eye on my dataset.


Thank you again gentlemen, I really appreciate your assistance.



I believe the dependent nested query might be really hard for the db engine to optimize well (though @John Saunders' request to see the execution plan is well founded, and seeing what indices you have would not hurt either;-).

我相信依赖的嵌套查询可能对数据库引擎很难很好地进行优化(尽管@John Saunders要求看到执行计划是有根据的,并且看到你所拥有的索引也不会受到伤害;-)。

But, a more natural approach to such ranking related problems in SQL Server 2005 and 2008 (and other SQL engines, since the feature is in recent ANSI standards) is ranking functions -- RANK, DENSE_RANK, or ROW_NUMBER... they're all equivalent when you're ranking by a unique field, anyway;-). Even apart from optimization, they're easier to read once you're used to them (and more powerful when your problems are harder than this one), especially with the help of that other neat new-ish construct, the WITH clause...:

但是,SQL Server 2005和2008(以及其他SQL引擎,因为该功能是最近的ANSI标准)中的这种排名相关问题的更自然的方法是排名函数 - RANK,DENSE_RANK或ROW_NUMBER ......它们都是无论如何,当你按一个独特的领域排名时,相当于;-)。即使除了优化之外,一旦你习惯它们就会更容易阅读(当你的问题比这个更难时更强大),特别是在其他整洁的新构造,WITH子句的帮助下。 。:

WITH OrderedMembers AS
    SELECT MemberId, GroupId,
    FROM Member 
SELECT MemberId, GroupId
FROM OrderedMembers 
WHERE RowNumber <= 2
ORDER BY MemberId;


You could probably use the RANK function to do this, but it might not be any faster. That's because you don't know why your query is slow.


Why not go find out? Look at the execution plan. See if there are table scans going on? Run the Query Optimizer and see what it has to say.


There's no reason to optimize until you know what's wrong.



Thank you John and Alex for your replies. I'm pretty fresh out of school and very new to SQLServer, so the execution plan option was brand new to me. It reported that 96% of the query cost was being consumed by a Clustered Index Scan which I'm assuming was a result of the nested query. Truth be told, I'm not quite sure what the next step would have been to optimize.


Alex, the query you provided ran in the blink of an eye on my dataset.


Thank you again gentlemen, I really appreciate your assistance.
