
时间:2021-09-04 12:32:54

I'm having a bit of trouble constructing a query to use the following conditions:


  1. Match against an org
  2. 与组织匹配

  3. Sorted by score (desc) and then by handle (asc)
  4. 按分数(desc)排序,然后按句柄(asc)排序

  5. Group on the type
  6. 关于类型的组

So this query is my starting point:


select * from social_media_handles where org = '00000001' order by score desc, handle asc;

Which will give me the following data ... which I then need to group by type so I'm only pulling out the top matched social_media_handles.


   org    |                            handle                             |                   url                   |   type   |      score      | dataset_date
 00000001 | boathousesw15                                                 | http://www.boathouseputney.co.uk        | twitter  | 500111972000056 | 2013-10-15
 00000001 | aspall                                                        | http://www.boathouseputney.co.uk        | twitter  | 500111972000018 | 2013-10-15
 00000001 | nathansloane                                                  | http://www.boathouseputney.co.uk        | twitter  | 500111972000018 | 2013-10-15
 00000001 | youngspubs                                                    | http://www.boathouseputney.co.uk        | twitter  | 500111972000018 | 2013-10-15
 00000001 | pages/the-boathouse-putney/153429008029137                    | http://www.boathouseputney.co.uk        | facebook | 500111972000011 | 2013-10-15
 00000001 | putneysocial                                                  | http://www.boathouseputney.co.uk        | twitter  | 500111972000009 | 2013-10-15
 00000001 | theexchangesw15                                               | http://www.boathouseputney.co.uk        | twitter  | 500111972000009 | 2013-10-15
 00000001 | youngspubs                                                    | http://www.youngshotels.co.uk           | twitter  | 500111970000016 | 2013-10-15

Expected output

   org    |                            handle                             |                   url                   |   type   |      score      | dataset_date
 00000001 | boathousesw15                                                 | http://www.boathouseputney.co.uk        | twitter  | 500111972000056 | 2013-10-15
 00000001 | pages/the-boathouse-putney/153429008029137                    | http://www.boathouseputney.co.uk        | facebook | 500111972000011 | 2013-10-15

I've tried group by, distinct and sub-queries, but didn't have much luck. Is there a pattern around this problem?


I am using Postgres and have this problem solved with distinct on, but I'm looking for a version which is compatible with different vendors.


2 个解决方案



This problem comes up frequently on SO, and it usually is given the tag (where n=1 in your case).

这个问题在SO上经常出现,通常给出标签最大n个每组(在你的情况下n = 1)。

Here are a couple of common solutions that would work in MySQL:


FROM social_media_handles AS h
    SELECT type, MAX(score) AS score 
    FROM social_media_handles WHERE org = '00000001' 
    GROUP BY type) AS maxh USING (type, score)
WHERE org = '00000001' 
ORDER BY score DESC, handle ASC;

The second solution uses no subquery or group-by. It tries to match a row h1 to a hypothetical row h1 with the same type and org, but with a higher score. If no such row h2 exists with a higher score, then h1 must be the row with the highest score.


FROM social_media_handles AS h1
LEFT OUTER JOIN social_media_handles AS h2
 ON h1.type = h2.type AND h1.org = h2.org AND h1.score < h2.score
WHERE h1.org = '00000001'
 AND h2.score IS NULL
ORDER BY h1.score DESC, h1.handle DESC;

Which solution is fastest? It depends. I have had both work better, depending on the size of the dataset, number of distinct types, etc. So you should test both solutions and see what works better for your case.


The CTE solution shown by @Roman Pekar is also good for an RDBMS that supports CTE syntax. Those include PostgreSQL, Oracle, Microsoft SQL Server, IBM DB2, and several others.

@Roman Pekar显示的CTE解决方案也适用于支持CTE语法的RDBMS。其中包括PostgreSQL,Oracle,Microsoft SQL Server,IBM DB2和其他几个。

MySQL and SQLite are the only widely used databases that still don't support CTE syntax.




There're a few methods to do this, all based on 2 ideas. First idea is to get recordset with max score for each type and then join original table to this recordset. Second idea works if you have ranking functions - you just use row_number() inside each type and then filter out all records with row_number > 1

有几种方法可以做到这一点,所有方法都基于2个想法。第一个想法是获取每种类型的最大分数的记录集,然后将原始表连接到此记录集。如果你有排名函数,第二个想法是有效的 - 你只需在每个类型中使用row_number(),然后过滤掉row_number> 1的所有记录

So the first idea could be written like this:


select *
from Table1 as T
    exists (
        select 1
        from Table1 as TT
        where TT.type = T.type
        having max(TT.score) = T.score


select T.*
from Table1 as T
    inner join (
        select max(TT.score), TT.type
        from Table1 as TT
        group by type
    ) as TT on TT.type = T.type and TT.score = T.score

If you have ranking functions, then you can use second idea also:


with cte as (
   select *, row_number() over(partition by type order by score desc) as rn
   from Table1
select *
from cte
where rn = 1

You can easily replace common table expression with subquery:


select *
from (
   select *, row_number() over(partition by type order by score desc) as rn
   from Table1
) as a
where rn = 1


One thing to mention - if you have more than one record with, for example, score = 500111972000056 and type = twitter, then first solution will return more than one record for type = 'twitter', while second one return one arbitrary row for type = 'twitter'

有一点要提 - 如果你有多个记录,例如,得分= 500111972000056和type = twitter,那么第一个解决方案将为type ='twitter'返回多个记录,而第二个解决方案将为类型返回一个任意行='推特'

Also, I forgot to mention third idea (see nice @Bill Karwin answer). I'll just add it here:

另外,我忘了提到第三个想法(见@Bill Karwin的回答)。我只想在这里添加:

select *
from Table1 as T
    not exists (
        select *
        from Table1 as TT
        where TT.type = T.type and TT.score > T.score

sql fiddle demo




This problem comes up frequently on SO, and it usually is given the tag (where n=1 in your case).

这个问题在SO上经常出现,通常给出标签最大n个每组(在你的情况下n = 1)。

Here are a couple of common solutions that would work in MySQL:


FROM social_media_handles AS h
    SELECT type, MAX(score) AS score 
    FROM social_media_handles WHERE org = '00000001' 
    GROUP BY type) AS maxh USING (type, score)
WHERE org = '00000001' 
ORDER BY score DESC, handle ASC;

The second solution uses no subquery or group-by. It tries to match a row h1 to a hypothetical row h1 with the same type and org, but with a higher score. If no such row h2 exists with a higher score, then h1 must be the row with the highest score.


FROM social_media_handles AS h1
LEFT OUTER JOIN social_media_handles AS h2
 ON h1.type = h2.type AND h1.org = h2.org AND h1.score < h2.score
WHERE h1.org = '00000001'
 AND h2.score IS NULL
ORDER BY h1.score DESC, h1.handle DESC;

Which solution is fastest? It depends. I have had both work better, depending on the size of the dataset, number of distinct types, etc. So you should test both solutions and see what works better for your case.


The CTE solution shown by @Roman Pekar is also good for an RDBMS that supports CTE syntax. Those include PostgreSQL, Oracle, Microsoft SQL Server, IBM DB2, and several others.

@Roman Pekar显示的CTE解决方案也适用于支持CTE语法的RDBMS。其中包括PostgreSQL,Oracle,Microsoft SQL Server,IBM DB2和其他几个。

MySQL and SQLite are the only widely used databases that still don't support CTE syntax.




There're a few methods to do this, all based on 2 ideas. First idea is to get recordset with max score for each type and then join original table to this recordset. Second idea works if you have ranking functions - you just use row_number() inside each type and then filter out all records with row_number > 1

有几种方法可以做到这一点,所有方法都基于2个想法。第一个想法是获取每种类型的最大分数的记录集,然后将原始表连接到此记录集。如果你有排名函数,第二个想法是有效的 - 你只需在每个类型中使用row_number(),然后过滤掉row_number> 1的所有记录

So the first idea could be written like this:


select *
from Table1 as T
    exists (
        select 1
        from Table1 as TT
        where TT.type = T.type
        having max(TT.score) = T.score


select T.*
from Table1 as T
    inner join (
        select max(TT.score), TT.type
        from Table1 as TT
        group by type
    ) as TT on TT.type = T.type and TT.score = T.score

If you have ranking functions, then you can use second idea also:


with cte as (
   select *, row_number() over(partition by type order by score desc) as rn
   from Table1
select *
from cte
where rn = 1

You can easily replace common table expression with subquery:


select *
from (
   select *, row_number() over(partition by type order by score desc) as rn
   from Table1
) as a
where rn = 1


One thing to mention - if you have more than one record with, for example, score = 500111972000056 and type = twitter, then first solution will return more than one record for type = 'twitter', while second one return one arbitrary row for type = 'twitter'

有一点要提 - 如果你有多个记录,例如,得分= 500111972000056和type = twitter,那么第一个解决方案将为type ='twitter'返回多个记录,而第二个解决方案将为类型返回一个任意行='推特'

Also, I forgot to mention third idea (see nice @Bill Karwin answer). I'll just add it here:

另外,我忘了提到第三个想法(见@Bill Karwin的回答)。我只想在这里添加:

select *
from Table1 as T
    not exists (
        select *
        from Table1 as TT
        where TT.type = T.type and TT.score > T.score

sql fiddle demo
