使用SQL Server 2005检索每个组的TOP X成员

I've seen this question asked a couple of times, and I've written my own query, but it's quite slow, and I would be extremely grateful if someone could offer advice on how to speed it up.

我已经看过几次这个问题了,我已经写了自己的查询,但是速度很慢,如果有人可以就如何加快速度提出建议,我将非常感激。

In a simplified scenario, I have the following two tables:

在简化的场景中,我有以下两个表:

Group
- GroupID (primary key)

组 - GroupID(主键)

Member
- MemberID (primary key)
- GroupID (foreign key)

成员 - MemberID(主键) - GroupID(外键)

Let's say, for each GroupID in Group, I want to find the top 2 MemberID values from Member that have that GroupID.

假设,对于Group中的每个GroupID,我想从Member中找到具有该GroupID的前2个MemberID值。

Here's my current query that works, but is painfully slow:

这是我当前的查询有效,但速度很慢:

SELECT M.MemberID, M.GroupID
FROM   Member AS M
WHERE  M.MemberID in 
        (Select top 2 Member.MemberID
         FROM Member
         Where Member.GroupID = M.GroupID
         ORDER BY Member.MemberID)

Say Group has the following rows
GroupID
1
2
3

Say Group具有以下行GroupID 1 2 3

and Member has the following rows
MemberID, GroupID
1, 1
2, 2
3, 3
4, 1
5, 2
6, 3
7, 1
8, 2
9, 3

和成员具有以下行MemberID,GroupID 1,1 2,2,3,4,1,5,2,6,7 7,1 8,2 3,3

Then my query should return:
MemberID GroupID
1, 1
2, 2
3, 3
4, 1
5, 2
6, 3

然后我的查询应该返回:MemberID GroupID 1,1 1,2 2,3 3 3,4,5 3,3

3 个解决方案

#1

I believe the dependent nested query might be really hard for the db engine to optimize well (though @John Saunders' request to see the execution plan is well founded, and seeing what indices you have would not hurt either;-).

我相信依赖的嵌套查询可能对数据库引擎很难很好地进行优化(尽管@John Saunders要求看到执行计划是有根据的,并且看到你所拥有的索引也不会受到伤害;-)。

But, a more natural approach to such ranking related problems in SQL Server 2005 and 2008 (and other SQL engines, since the feature is in recent ANSI standards) is ranking functions -- RANK, DENSE_RANK, or ROW_NUMBER... they're all equivalent when you're ranking by a unique field, anyway;-). Even apart from optimization, they're easier to read once you're used to them (and more powerful when your problems are harder than this one), especially with the help of that other neat new-ish construct, the WITH clause...:

但是,SQL Server 2005和2008(以及其他SQL引擎,因为该功能是最近的ANSI标准)中的这种排名相关问题的更自然的方法是排名函数 - RANK,DENSE_RANK或ROW_NUMBER ......它们都是无论如何,当你按一个独特的领域排名时,相当于;-)。即使除了优化之外,一旦你习惯它们就会更容易阅读(当你的问题比这个更难时更强大),特别是在其他整洁的新构造,WITH子句的帮助下。。:

WITH OrderedMembers AS
(
    SELECT MemberId, GroupId,
    ROW_NUMBER() OVER (PARTITION BY GroupId ORDER BY MemberId) AS RowNumber
    FROM Member 
) 
SELECT MemberId, GroupId
FROM OrderedMembers 
WHERE RowNumber <= 2
ORDER BY MemberId;

#2

You could probably use the RANK function to do this, but it might not be any faster. That's because you don't know why your query is slow.

您可以使用RANK函数执行此操作,但它可能不会更快。那是因为你不知道为什么你的查询很慢。

Why not go find out? Look at the execution plan. See if there are table scans going on? Run the Query Optimizer and see what it has to say.

为什么不去找?看看执行计划。看看是否有桌面扫描?运行查询优化器,看看它有什么用。

There's no reason to optimize until you know what's wrong.

在你知道什么是错的之前没有理由进行优化。

#3

Thank you John and Alex for your replies. I'm pretty fresh out of school and very new to SQLServer, so the execution plan option was brand new to me. It reported that 96% of the query cost was being consumed by a Clustered Index Scan which I'm assuming was a result of the nested query. Truth be told, I'm not quite sure what the next step would have been to optimize.

感谢John和Alex的回复。我离开学校很新鲜,对SQLServer来说很新,所以执行计划选项对我来说是全新的。据报道,96%的查询成本是由聚集索引扫描消耗的,我假设这是嵌套查询的结果。说实话,我不太清楚下一步要优化的是什么。

Alex, the query you provided ran in the blink of an eye on my dataset.

Alex,您提供的查询在我的数据集中眨眼间就开始了。

Thank you again gentlemen, I really appreciate your assistance.

再次感谢先生们,非常感谢您的协助。

#1

WITH OrderedMembers AS
(
    SELECT MemberId, GroupId,
    ROW_NUMBER() OVER (PARTITION BY GroupId ORDER BY MemberId) AS RowNumber
    FROM Member 
) 
SELECT MemberId, GroupId
FROM OrderedMembers 
WHERE RowNumber <= 2
ORDER BY MemberId;

#2