选择组中每个组的最大记录

时间:2021-09-13 11:41:59

I'm using PostgreSQL. I need to select the max of each group, the situation is that the table represents the products sell on each day, and I want to know the top sold product of each day.

我正在使用PostgreSQL。我需要选择每组的最大值,情况是表格代表每天销售的产品,我想知道每天最畅销的产品。

SELECT sum(detalle_orden.cantidad) as suma,detalle_orden.producto_id as producto
      ,to_char(date_trunc('day',orden.fecha AT TIME ZONE 'MST'),'DY') as dia
FROM detalle_orden
LEFT JOIN orden ON orden.id = detalle_orden.order_id
GROUP BY orden.fecha,detalle_orden.producto_id 
ORDER BY dia,suma desc

Is returning:

suma  producto  dia
4     1         FRI
1     2         FRI
5     3         TUE
2     2         TUE

I want to get:

我想得到:

suma  producto  dia
4     1         FRI
5     3         TUE

Only the top product of each day (with the max(suma) of each group).

只有每天的最佳产品(每组的最大(suma))。

I tried different approaches, like subqueries, but the aggregate function used make things a bit difficult.

我尝试了不同的方法,比如子查询,但使用的聚合函数使事情变得有点困难。

3 个解决方案

#1


You can still use DISTINCT ON to get this done in a single query level without subquery, because DISTINCT is applied after GROUP BY and aggregate functions (and after window functions):

您仍然可以使用DISTINCT ON在没有子查询的单个查询级别中完成此操作,因为DISTINCT在GROUP BY和聚合函数之后(以及在窗口函数之后)应用:

SELECT DISTINCT ON (3)
       sum(d.cantidad) AS suma
     , d.producto_id AS producto
     , to_char(o.fecha AT TIME ZONE 'MST', 'DY') AS dia
FROM   detalle_orden d
LEFT   JOIN orden o ON o.id = d.order_id
GROUP  BY o.fecha, d.producto_id 
ORDER  BY 3, 1 DESC NULLS LAST, d.producto_id;

Notes

  • This solution returns exactly one row per dia (if available). if multiple products tie for top sales my arbitrary (but deterministic and reproducible) pick is the one with the smaller producto_id.
    If you need all peers tying for one day use rank() as suggested by @Houari.

    此解决方案每个直接返回一行(如果可用)。如果多个产品与*销售相关,那么我的任意(但确定性和可重复性)选择是具有较小producto_id的选择。如果你需要所有同伴打一天,请使用@Houari建议的rank()。

  • The sequence of events in an SQL SELECT query is explained in this related answer:

    SQL SELECT查询中的事件序列在相关答案中进行了解释:

  • date_trunc() was just noise in the calculation of dia. I removed it.

    date_trunc()只是dia计算中的噪音。我删除了它。

  • I added NULLS LAST to the descending sort order since it is unclear whether there might be rows with NULL for suma in the result:

    我将NULLS LAST添加到降序排序中,因为不清楚结果中是否有suma行为NULL:

  • The numbers in DISTINCT ON and GROUP BY are just a syntactical shorthand notation for convenience. Similar:

    为方便起见,DISTINCT ON和GROUP BY中的数字只是一种语法简写符号。类似:

    As are the added table aliases (syntactical shorthand notation).

    与添加的表别名一样(语法简写表示法)。

  • Basics for DISTINCT ON

    DISTINCT ON的基础知识

#2


You can (ab)use SELECT DISTINCT ON with the appropriate ordering clause. Assuming you made your previous query into a view:

您可以(ab)使用SELECT DISTINCT ON和相应的排序子句。假设您将之前的查询放入视图中:

SELECT DISTINCT ON (dia, producto) * FROM some_view ORDER BY dia, producto, suma DESC;

the DISTINCT ensures you will retain only one row for every day and product, and the ORDER BY ensures it retains the correct one

DISTINCT确保您每天只保留一行产品,ORDER BY确保它保留正确的一行

#3


By the windowing function: RANK you can easely get it:

通过窗口功能:RANK你可以轻松搞定:

select * from
(
select suma,producto,dia, rank() over (partition by dia order by suma desc) as ranking
from your_query
)A
where ranking = 1

So you final query will be something like:

所以你最后的查询将是这样的:

select * from
(
select suma,producto,dia, rank() over (partition by dia order by suma desc) as ranking
from 
(
SELECT sum(detalle_orden.cantidad) as suma,detalle_orden.producto_id as     producto,to_char(date_trunc
    ('day',orden.fecha AT TIME ZONE 'MST'),'DY') as dia FROM detalle_orden     LEFT JOIN
    orden ON orden.id= detalle_orden.order_id GROUP by
    orden.fecha,detalle_orden.producto_id ) B
) A
where ranking = 1

#1


You can still use DISTINCT ON to get this done in a single query level without subquery, because DISTINCT is applied after GROUP BY and aggregate functions (and after window functions):

您仍然可以使用DISTINCT ON在没有子查询的单个查询级别中完成此操作,因为DISTINCT在GROUP BY和聚合函数之后(以及在窗口函数之后)应用:

SELECT DISTINCT ON (3)
       sum(d.cantidad) AS suma
     , d.producto_id AS producto
     , to_char(o.fecha AT TIME ZONE 'MST', 'DY') AS dia
FROM   detalle_orden d
LEFT   JOIN orden o ON o.id = d.order_id
GROUP  BY o.fecha, d.producto_id 
ORDER  BY 3, 1 DESC NULLS LAST, d.producto_id;

Notes

  • This solution returns exactly one row per dia (if available). if multiple products tie for top sales my arbitrary (but deterministic and reproducible) pick is the one with the smaller producto_id.
    If you need all peers tying for one day use rank() as suggested by @Houari.

    此解决方案每个直接返回一行(如果可用)。如果多个产品与*销售相关,那么我的任意(但确定性和可重复性)选择是具有较小producto_id的选择。如果你需要所有同伴打一天,请使用@Houari建议的rank()。

  • The sequence of events in an SQL SELECT query is explained in this related answer:

    SQL SELECT查询中的事件序列在相关答案中进行了解释:

  • date_trunc() was just noise in the calculation of dia. I removed it.

    date_trunc()只是dia计算中的噪音。我删除了它。

  • I added NULLS LAST to the descending sort order since it is unclear whether there might be rows with NULL for suma in the result:

    我将NULLS LAST添加到降序排序中,因为不清楚结果中是否有suma行为NULL:

  • The numbers in DISTINCT ON and GROUP BY are just a syntactical shorthand notation for convenience. Similar:

    为方便起见,DISTINCT ON和GROUP BY中的数字只是一种语法简写符号。类似:

    As are the added table aliases (syntactical shorthand notation).

    与添加的表别名一样(语法简写表示法)。

  • Basics for DISTINCT ON

    DISTINCT ON的基础知识

#2


You can (ab)use SELECT DISTINCT ON with the appropriate ordering clause. Assuming you made your previous query into a view:

您可以(ab)使用SELECT DISTINCT ON和相应的排序子句。假设您将之前的查询放入视图中:

SELECT DISTINCT ON (dia, producto) * FROM some_view ORDER BY dia, producto, suma DESC;

the DISTINCT ensures you will retain only one row for every day and product, and the ORDER BY ensures it retains the correct one

DISTINCT确保您每天只保留一行产品,ORDER BY确保它保留正确的一行

#3


By the windowing function: RANK you can easely get it:

通过窗口功能:RANK你可以轻松搞定:

select * from
(
select suma,producto,dia, rank() over (partition by dia order by suma desc) as ranking
from your_query
)A
where ranking = 1

So you final query will be something like:

所以你最后的查询将是这样的:

select * from
(
select suma,producto,dia, rank() over (partition by dia order by suma desc) as ranking
from 
(
SELECT sum(detalle_orden.cantidad) as suma,detalle_orden.producto_id as     producto,to_char(date_trunc
    ('day',orden.fecha AT TIME ZONE 'MST'),'DY') as dia FROM detalle_orden     LEFT JOIN
    orden ON orden.id= detalle_orden.order_id GROUP by
    orden.fecha,detalle_orden.producto_id ) B
) A
where ranking = 1