尝试对列进行分组，同时通过排序选择所有其他信息

I'm having a bit of trouble constructing a query to use the following conditions:

我在构建查询以使用以下条件时遇到了一些麻烦:

Match against an org

与组织匹配

Sorted by score (desc) and then by handle (asc)

按分数(desc)排序,然后按句柄(asc)排序

Group on the type

关于类型的组

So this query is my starting point:

所以这个查询是我的出发点:

select * from social_media_handles where org = '00000001' order by score desc, handle asc;

Which will give me the following data ... which I then need to group by type so I'm only pulling out the top matched social_media_handles.

这将给我以下数据...然后我需要按类型分组,所以我只是拉出最匹配的social_media_handles。

   org    |                            handle                             |                   url                   |   type   |      score      | dataset_date
----------+---------------------------------------------------------------+-----------------------------------------+----------+-----------------+--------------
 00000001 | boathousesw15                                                 | http://www.boathouseputney.co.uk        | twitter  | 500111972000056 | 2013-10-15
 00000001 | aspall                                                        | http://www.boathouseputney.co.uk        | twitter  | 500111972000018 | 2013-10-15
 00000001 | nathansloane                                                  | http://www.boathouseputney.co.uk        | twitter  | 500111972000018 | 2013-10-15
 00000001 | youngspubs                                                    | http://www.boathouseputney.co.uk        | twitter  | 500111972000018 | 2013-10-15
 00000001 | pages/the-boathouse-putney/153429008029137                    | http://www.boathouseputney.co.uk        | facebook | 500111972000011 | 2013-10-15
 00000001 | putneysocial                                                  | http://www.boathouseputney.co.uk        | twitter  | 500111972000009 | 2013-10-15
 00000001 | theexchangesw15                                               | http://www.boathouseputney.co.uk        | twitter  | 500111972000009 | 2013-10-15
 00000001 | youngspubs                                                    | http://www.youngshotels.co.uk           | twitter  | 500111970000016 | 2013-10-15

Expected output

   org    |                            handle                             |                   url                   |   type   |      score      | dataset_date
----------+---------------------------------------------------------------+-----------------------------------------+----------+-----------------+--------------
 00000001 | boathousesw15                                                 | http://www.boathouseputney.co.uk        | twitter  | 500111972000056 | 2013-10-15
 00000001 | pages/the-boathouse-putney/153429008029137                    | http://www.boathouseputney.co.uk        | facebook | 500111972000011 | 2013-10-15

I've tried group by, distinct and sub-queries, but didn't have much luck. Is there a pattern around this problem?

我尝试过分组,不同和子查询,但没有太多运气。围绕这个问题有一种模式吗?

I am using Postgres and have this problem solved with distinct on, but I'm looking for a version which is compatible with different vendors.

我正在使用Postgres并解决了这个问题,但我正在寻找一个与不同供应商兼容的版本。

2 个解决方案

#1

This problem comes up frequently on SO, and it usually is given the tag greatest-n-per-group (where n=1 in your case).

这个问题在SO上经常出现,通常给出标签最大n个每组(在你的情况下n = 1)。

Here are a couple of common solutions that would work in MySQL:

以下是一些适用于MySQL的常见解决方案:

SELECT h.*
FROM social_media_handles AS h
JOIN (
    SELECT type, MAX(score) AS score 
    FROM social_media_handles WHERE org = '00000001' 
    GROUP BY type) AS maxh USING (type, score)
WHERE org = '00000001' 
ORDER BY score DESC, handle ASC;

The second solution uses no subquery or group-by. It tries to match a row h1 to a hypothetical row h1 with the same type and org, but with a higher score. If no such row h2 exists with a higher score, then h1 must be the row with the highest score.

第二种解决方案不使用子查询或分组。它尝试将行h1与具有相同类型和组织的假设行h1匹配,但得分更高。如果不存在具有更高分数的这样的行h2,则h1必须是具有最高分数的行。

SELECT h1.*
FROM social_media_handles AS h1
LEFT OUTER JOIN social_media_handles AS h2
 ON h1.type = h2.type AND h1.org = h2.org AND h1.score < h2.score
WHERE h1.org = '00000001'
 AND h2.score IS NULL
ORDER BY h1.score DESC, h1.handle DESC;

Which solution is fastest? It depends. I have had both work better, depending on the size of the dataset, number of distinct types, etc. So you should test both solutions and see what works better for your case.

哪种解决方案最快?这取决于。我的工作效果更好,具体取决于数据集的大小,不同类型的数量等。因此,您应该测试两种解决方案,看看哪种解决方案更适合您的情况。

The CTE solution shown by @Roman Pekar is also good for an RDBMS that supports CTE syntax. Those include PostgreSQL, Oracle, Microsoft SQL Server, IBM DB2, and several others.

@Roman Pekar显示的CTE解决方案也适用于支持CTE语法的RDBMS。其中包括PostgreSQL,Oracle,Microsoft SQL Server,IBM DB2和其他几个。

MySQL and SQLite are the only widely used databases that still don't support CTE syntax.

MySQL和SQLite是唯一仍然不支持CTE语法的广泛使用的数据库。

#2

There're a few methods to do this, all based on 2 ideas. First idea is to get recordset with max score for each type and then join original table to this recordset. Second idea works if you have ranking functions - you just use row_number() inside each type and then filter out all records with row_number > 1

有几种方法可以做到这一点,所有方法都基于2个想法。第一个想法是获取每种类型的最大分数的记录集,然后将原始表连接到此记录集。如果你有排名函数,第二个想法是有效的 - 你只需在每个类型中使用row_number(),然后过滤掉row_number> 1的所有记录

So the first idea could be written like this:

所以第一个想法可以写成这样:

select *
from Table1 as T
where
    exists (
        select 1
        from Table1 as TT
        where TT.type = T.type
        having max(TT.score) = T.score
    )

select T.*
from Table1 as T
    inner join (
        select max(TT.score), TT.type
        from Table1 as TT
        group by type
    ) as TT on TT.type = T.type and TT.score = T.score

If you have ranking functions, then you can use second idea also:

如果你有排名功能,那么你也可以使用第二个想法:

with cte as (
   select *, row_number() over(partition by type order by score desc) as rn
   from Table1
)
select *
from cte
where rn = 1

You can easily replace common table expression with subquery:

您可以使用子查询轻松替换公用表表达式:

select *
from (
   select *, row_number() over(partition by type order by score desc) as rn
   from Table1
) as a
where rn = 1

update

One thing to mention - if you have more than one record with, for example, score = 500111972000056 and type = twitter, then first solution will return more than one record for type = 'twitter', while second one return one arbitrary row for type = 'twitter'

有一点要提 - 如果你有多个记录,例如,得分= 500111972000056和type = twitter,那么第一个解决方案将为type ='twitter'返回多个记录,而第二个解决方案将为类型返回一个任意行='推特'

Also, I forgot to mention third idea (see nice @Bill Karwin answer). I'll just add it here:

另外,我忘了提到第三个想法(见@Bill Karwin的回答)。我只想在这里添加:

select *
from Table1 as T
where
    not exists (
        select *
        from Table1 as TT
        where TT.type = T.type and TT.score > T.score
    );

sql fiddle demo

sql小提琴演示

#1