当SELECT子句中不存在聚合函数时,按行为分组

时间:2021-10-26 10:16:24

I have a table emp with following structure and data:

我有一个表emp,它的结构和数据如下:

name   dept    salary
-----  -----   -----
Jack   a       2
Jill   a       1
Tom    b       2
Fred   b       1

When I execute the following SQL:

当我执行以下SQL时:

SELECT * FROM emp GROUP BY dept

I get the following result:

我得到以下结果:

name   dept    salary
-----  -----   -----
Jill   a       1
Fred   b       1

On what basis did the server decide return Jill and Fred and exclude Jack and Tom?

服务器根据什么决定返回吉尔和弗雷德,排除杰克和汤姆?

I am running this query in MySQL.

我在MySQL中运行这个查询。

Note 1: I know the query doesn't make sense on its own. I am trying to debug a problem with a 'GROUP BY' scenario. I am trying to understand the default behavior for this purpose.

注意1:我知道查询本身没有意义。我正在尝试用“逐组”场景调试一个问题。我正试图理解为此目的的默认行为。

Note 2: I am used to writing the SELECT clause same as the GROUP BY clause (minus the aggregate fields). When I came across the behavior described above, I started wondering if I can rely on this for scenarios such as: select the rows from emp table where the salary is the lowest/highest in the dept. E.g.: The SQL statements like this works on MySQL:

注意2:我习惯了像GROUP BY子句(减去聚合字段)一样编写SELECT子句。当我遇到上面描述的行为时,我开始怀疑我是否可以在以下场景中使用它:从emp表中选择薪资在部门中最低/最高的行。

SELECT A.*, MIN(A.salary) AS min_salary FROM emp AS A GROUP BY A.dept

I didn't find any material describing why such SQL works, more importantly if I can rely on such behavior consistently. If this is a reliable behavior then I can avoid queries like:

我没有找到任何描述这种SQL为何有效的材料,更重要的是,如果我能够始终依赖这种行为的话。如果这是一个可靠的行为,那么我可以避免以下问题:

SELECT A.* FROM emp AS A WHERE A.salary = ( 
            SELECT MAX(B.salary) FROM emp B WHERE B.dept = A.dept)

8 个解决方案

#1


34  

Read MySQL documentation on this particular point.

请阅读关于这一点的MySQL文档。

In a nutshell, MySQL allows omitting some columns from the GROUP BY, for performance purposes, however this works only if the omitted columns all have the same value (within a grouping), otherwise, the value returned by the query are indeed indeterminate, as properly guessed by others in this post. To be sure adding an ORDER BY clause would not re-introduce any form of deterministic behavior.

简单地说,出于性能的考虑,MySQL允许从组中删除一些列,但是只有当被省略的列都具有相同的值(在一个组中)时,才可以这样做,否则,查询返回的值确实是不确定的,正如本文中其他人正确地猜测的那样。要确保添加一个ORDER BY子句不会重新引入任何形式的确定性行为。

Although not at the core of the issue, this example shows how using * rather than an explicit enumeration of desired columns is often a bad idea.

虽然这不是问题的核心,但是这个例子显示了使用*而不是显式枚举所需的列通常是一个坏主意。

Excerpt from MySQL 5.0 documentation:

摘自MySQL 5.0文档:

When using this feature, all rows in each group should have the same values
for the columns that are omitted from the GROUP BY part. The server is free
to return any value from the group, so the results are indeterminate unless
all values are the same. 

#2


9  

This is a bit late, but I'll put this up for future reference.

这有点晚了,但我将把它放在后面作为参考。

The GROUP BY takes the first row that has a duplicate and discards any rows that match after it in the result set. So if Jack and Tom have the same department, whoever appears first in a normal SELECT will be the resulting row in the GROUP BY.

GROUP BY使用第一行的副本,并丢弃在结果集中后匹配的任何行。因此,如果Jack和Tom有相同的部门,那么在正常选择中出现的第一行将是GROUP BY的结果行。

If you want to control what appears first in the list, you need to do an ORDER BY. However, SQL does not allow ORDER BY to come before GROUP BY, as it will throw an exception. The best workaround for this issue is to do the ORDER BY in a subquery and then a GROUP BY in the outer query. Here's an example:

如果您想控制列表中最先出现的内容,您需要执行ORDER BY。但是,SQL不允许ORDER BY先于GROUP BY,因为它会抛出一个异常。这个问题的最佳解决方案是在子查询中执行顺序,然后在外部查询中执行组。这里有一个例子:

SELECT * FROM (SELECT * FROM emp ORDER BY name) as foo GROUP BY dept

This is the best performing technique I've found. I hope this helps someone out.

这是我发现的最好的表演技巧。我希望这能帮助别人。

#3


4  

As far as I know, for your purposes the specific rows returned can be concidered to be random.

据我所知,出于您的目的,返回的特定行可以简化为随机的。

Ordering only takes place after GROUP BY is done

排序只在GROUP BY完成之后进行

#4


2  

You can put a:

你可以放一个:

SET sql_mode = 'ONLY_FULL_GROUP_BY'

设置sql_mode = ' ONLY_FULL_GROUP_BY '

before your query to enforce SQL standard GROUP BY behavior

在查询执行SQL标准组之前

#5


1  

I find that the best thing to do is to consider this type of query unsupported. In most other database systems, you can't include columns that aren't either in the GROUP BY clause or in an aggregate function in the HAVING, SELECT or ORDER BY clauses.

我发现最好的做法是考虑不支持这种类型的查询。在大多数其他数据库系统中,不能包括在GROUP BY子句中或在have、SELECT或ORDER BY子句中的聚合函数中不包含的列。

Instead, consider that your query reads:

相反,考虑一下您的查询:

SELECT ANY(name), dept, ANY(salary)
FROM emp 
GROUP BY dept;

...since this is what's going on.

…因为这就是发生的事。

Hope this helps....

希望这有助于....

#6


0  

I think ANSI SQL requires that the select includes only fields from the GROUP BY clause, plus aggregate functions. This behaviour of MySQL looks like returns some row, possibly the last one the server read, or any row it had at hand, but don't rely on that.

我认为ANSI SQL要求select只包含GROUP BY子句中的字段,以及聚合函数。MySQL的这种行为看起来像是返回了一些行,可能是服务器读的最后一行,也可能是它手边的任何一行,但不要依赖这些行。

#7


-1  

Try using ORDER BY to pick the row that you want.

尝试使用ORDER BY来选择您想要的行。

SELECT * FROM emp GROUP BY dept ORDER BY name ASC;

Will return the following:

将返回以下:

name   dept    salary
-----  -----   -----
jack   a       2
fred   b       1

#8


-1  

If you are grouping by department does it matter about the other data? I know Sql Server will not even allow this query. If there is a possibility of this sounds like there might be other issues.

如果按部门分组,其他数据是否重要?我知道Sql Server甚至不允许这个查询。如果有这种可能性,听起来可能还有其他问题。

#1


34  

Read MySQL documentation on this particular point.

请阅读关于这一点的MySQL文档。

In a nutshell, MySQL allows omitting some columns from the GROUP BY, for performance purposes, however this works only if the omitted columns all have the same value (within a grouping), otherwise, the value returned by the query are indeed indeterminate, as properly guessed by others in this post. To be sure adding an ORDER BY clause would not re-introduce any form of deterministic behavior.

简单地说,出于性能的考虑,MySQL允许从组中删除一些列,但是只有当被省略的列都具有相同的值(在一个组中)时,才可以这样做,否则,查询返回的值确实是不确定的,正如本文中其他人正确地猜测的那样。要确保添加一个ORDER BY子句不会重新引入任何形式的确定性行为。

Although not at the core of the issue, this example shows how using * rather than an explicit enumeration of desired columns is often a bad idea.

虽然这不是问题的核心,但是这个例子显示了使用*而不是显式枚举所需的列通常是一个坏主意。

Excerpt from MySQL 5.0 documentation:

摘自MySQL 5.0文档:

When using this feature, all rows in each group should have the same values
for the columns that are omitted from the GROUP BY part. The server is free
to return any value from the group, so the results are indeterminate unless
all values are the same. 

#2


9  

This is a bit late, but I'll put this up for future reference.

这有点晚了,但我将把它放在后面作为参考。

The GROUP BY takes the first row that has a duplicate and discards any rows that match after it in the result set. So if Jack and Tom have the same department, whoever appears first in a normal SELECT will be the resulting row in the GROUP BY.

GROUP BY使用第一行的副本,并丢弃在结果集中后匹配的任何行。因此,如果Jack和Tom有相同的部门,那么在正常选择中出现的第一行将是GROUP BY的结果行。

If you want to control what appears first in the list, you need to do an ORDER BY. However, SQL does not allow ORDER BY to come before GROUP BY, as it will throw an exception. The best workaround for this issue is to do the ORDER BY in a subquery and then a GROUP BY in the outer query. Here's an example:

如果您想控制列表中最先出现的内容,您需要执行ORDER BY。但是,SQL不允许ORDER BY先于GROUP BY,因为它会抛出一个异常。这个问题的最佳解决方案是在子查询中执行顺序,然后在外部查询中执行组。这里有一个例子:

SELECT * FROM (SELECT * FROM emp ORDER BY name) as foo GROUP BY dept

This is the best performing technique I've found. I hope this helps someone out.

这是我发现的最好的表演技巧。我希望这能帮助别人。

#3


4  

As far as I know, for your purposes the specific rows returned can be concidered to be random.

据我所知,出于您的目的,返回的特定行可以简化为随机的。

Ordering only takes place after GROUP BY is done

排序只在GROUP BY完成之后进行

#4


2  

You can put a:

你可以放一个:

SET sql_mode = 'ONLY_FULL_GROUP_BY'

设置sql_mode = ' ONLY_FULL_GROUP_BY '

before your query to enforce SQL standard GROUP BY behavior

在查询执行SQL标准组之前

#5


1  

I find that the best thing to do is to consider this type of query unsupported. In most other database systems, you can't include columns that aren't either in the GROUP BY clause or in an aggregate function in the HAVING, SELECT or ORDER BY clauses.

我发现最好的做法是考虑不支持这种类型的查询。在大多数其他数据库系统中,不能包括在GROUP BY子句中或在have、SELECT或ORDER BY子句中的聚合函数中不包含的列。

Instead, consider that your query reads:

相反,考虑一下您的查询:

SELECT ANY(name), dept, ANY(salary)
FROM emp 
GROUP BY dept;

...since this is what's going on.

…因为这就是发生的事。

Hope this helps....

希望这有助于....

#6


0  

I think ANSI SQL requires that the select includes only fields from the GROUP BY clause, plus aggregate functions. This behaviour of MySQL looks like returns some row, possibly the last one the server read, or any row it had at hand, but don't rely on that.

我认为ANSI SQL要求select只包含GROUP BY子句中的字段,以及聚合函数。MySQL的这种行为看起来像是返回了一些行,可能是服务器读的最后一行,也可能是它手边的任何一行,但不要依赖这些行。

#7


-1  

Try using ORDER BY to pick the row that you want.

尝试使用ORDER BY来选择您想要的行。

SELECT * FROM emp GROUP BY dept ORDER BY name ASC;

Will return the following:

将返回以下:

name   dept    salary
-----  -----   -----
jack   a       2
fred   b       1

#8


-1  

If you are grouping by department does it matter about the other data? I know Sql Server will not even allow this query. If there is a possibility of this sounds like there might be other issues.

如果按部门分组,其他数据是否重要?我知道Sql Server甚至不允许这个查询。如果有这种可能性,听起来可能还有其他问题。