必须在GROUP BY子句中出现或在聚合函数中使用?

I have a table that looks like this caller 'makerar'

我有一个表格看起来像调用者" makerar "

 cname  | wmname |          avg           
--------+-------------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | luffy  | 1.00000000000000000000
 spain  | usopp  |     5.0000000000000000

And I want to select the maximum avg for each cname.

我要为每个cname选择最大avg。

SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname;

but I will get an error,

但我会得到一个错误，

ERROR:  column "makerar.wmname" must appear in the GROUP BY clause or be used in an   aggregate function 
LINE 1: SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname;

so i do this

所以我这样做

SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname, wmname;

however this will not give the intented results, and the incorrect output below is shown.

然而，这不会给出预期的结果，下面显示了错误的输出。

 cname  | wmname |          max           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | luffy  | 1.00000000000000000000
 spain  | usopp  |     5.0000000000000000

Actual Results should be

实际结果应该

 cname  | wmname |          max           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | usopp  |     5.0000000000000000

How can I go about fixing this issue?

我该如何着手解决这个问题呢?

Note: This table is a VIEW created from a previous operation.

注意:该表是由以前的操作创建的视图。

6 个解决方案

#1

123

Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY clause[*].

是的，这是一个常见的聚合问题。在SQL3(1999)之前，所选字段必须出现在GROUP BY子句[*]中。

To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:

要解决这个问题，您必须计算子查询中的聚合，然后将其与自身连接，以获得需要显示的附加列:

SELECT m.cname, m.wmname, t.mx
FROM (
    SELECT cname, MAX(avg) AS mx
    FROM makerar
    GROUP BY cname
    ) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;

 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | usopp  |     5.0000000000000000

But you may also use window functions, which looks simpler:

但你也可以使用窗口函数，看起来更简单:

SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;

The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname level) MAX for the country in each row, so it's up to you:

该方法的惟一功能是显示所有记录(窗口函数不分组)。但它将显示每一行中国家的正确最大值(即cname级别的最大值)，因此由您决定:

 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | luffy  |     5.0000000000000000
 spain  | usopp  |     5.0000000000000000

The solution, arguably less elegant, to show the only (cname, wmname) tuples matching the max value, is:

解决方案，可能不那么优雅，只显示匹配最大值的唯一(cname、wmname)元组:

SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
    m.cname, m.wmname, t.avg AS mx
FROM (
    SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn 
    FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;


 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | usopp  |     5.0000000000000000

[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY) manually in the server configuration for this feature to be supported...

[*]:有趣的是，尽管规范允许选择非分组字段，但主流引擎似乎并不喜欢它。Oracle和SQLServer完全不允许这样做。Mysql过去默认允许它，但是现在由于5.7管理员需要在服务器配置中手动启用这个选项(ONLY_FULL_GROUP_BY)，以便支持这个特性……

#2

In Postgres, you can also use the special DISTINCT ON (expression) syntax:

在Postgres中，还可以使用特殊的ON(表达式)语法:

SELECT DISTINCT ON (cname) 
    cname, wmname, avg
FROM 
    makerar 
ORDER BY 
    cname, avg DESC ;

#3

SELECT t1.cname, t1.wmname, t2.max
FROM makerar t1 JOIN (
    SELECT cname, MAX(avg) max
    FROM makerar
    GROUP BY cname ) t2
ON t1.cname = t2.cname AND t1.avg = t2.max;

Using rank() window function:

使用等级()窗口函数:

SELECT cname, wmname, avg
FROM (
    SELECT cname, wmname, avg, rank() 
    OVER (PARTITION BY cname ORDER BY avg DESC)
    FROM makerar) t
WHERE rank = 1;

Note

请注意

Either one will preserve multiple max values per group. If you want only single record per group even if there is more than one record with avg equal to max you should check @ypercube's answer.

每个组都保留多个最大值。如果您希望每个组只有一条记录，即使有多个记录的avg等于max，您应该检查@ypercube的答案。

#4

The problem with specifying non-grouped and non-aggregate fields in group by selects, is that engine has no way of knowing, which record's field it should return in this case. Is it first? Is it last? There is usually no record that naturally corresponds to aggregated result (min and max are exceptions).

通过select在组中指定非分组和非聚合字段的问题是，引擎无法知道在这种情况下应该返回哪个记录的字段。这是第一次吗?这是最后一次吗?通常没有与聚合结果相对应的记录(最小值和最大值是例外)。

However, there is a workaround: make the required field aggregated as well. In posgres, this should work:

但是，有一个变通方法:使所需的字段也聚合起来。在负鼠中，这应该是有效的:

SELECT cname, (array_agg(wmname ORDER BY avg DESC))[1], MAX(avg)
FROM makerar GROUP BY cname;

Note that this creates an array of all wnames, ordered by avg, and returns the first element (arrays in postgres are 1-based).

注意，这将创建一个由avg命令的所有wnames数组，并返回第一个元素(postgres中的数组是基于1的)。

#5

I recently run into this problem, when trying to count using case when, and found that changing the order of the which and count statements fixes the problem:

我最近遇到了这个问题，当我尝试计算用例的时候，发现改变which和count语句的顺序可以解决这个问题:

SELECT date(dateday) as pick_day,
COUNT(CASE WHEN (apples = 'TRUE' OR oranges 'TRUE') THEN fruit END)  AS fruit_counter

FROM pickings

GROUP BY 1

Instead of using - in the latter, where I got errors that apples and oranges should appear in aggregate functions

而不是使用后一种，我有错误，苹果和橘子应该出现在聚合函数中。

CASE WHEN ((apples = 'TRUE' OR oranges 'TRUE') THEN COUNT(*) END) END AS fruit_counter

#6

This seems to work as well

这似乎也行得通

SELECT *
FROM makerar m1
WHERE m1.avg = (SELECT MAX(avg)
                FROM makerar m2
                WHERE m1.cname = m2.cname
               )

#1

123