我可以使用group by的非聚合列吗?

时间:2022-04-27 20:47:57

You cannot (should not) put non-aggregates in the SELECT line of a GROUP BY query.

您不能(不应该)将非聚合放在GROUP BY查询的SELECT行中。

I would however like access the one of the non-aggregates associated with the max. In plain english, I want a table with the oldest id of each kind.

但是,我想访问与max相关联的非聚合之一。用简单的英语,我想要一张每种类型最老的表格。

CREATE TABLE stuff (
   id int,
   kind int,
   age int
);

This query gives me the information I'm after:

这个查询给了我以后的信息:

SELECT kind, MAX(age)
FROM stuff
GROUP BY kind;

But it's not in the most useful form. I really want the id associated with each row so I can use it in later queries.

但它不是最有用的形式。我真的希望与每行相关联的id,以便我可以在以后的查询中使用它。

I'm looking for something like this:

我正在寻找这样的东西:

SELECT id, kind, MAX(age)
FROM stuff
GROUP BY kind;

That outputs this:

这输出:

SELECT stuff.*
FROM
   stuff,
   ( SELECT kind, MAX(age)
     FROM stuff
     GROUP BY kind) maxes
WHERE
   stuff.kind = maxes.kind AND
   stuff.age = maxes.age

It really seems like there should be away to get this information without needing to join. I just need the SQL engine to remember the other columns when it's calculating the max.

看起来似乎应该离开以获取此信息而无需加入。我只需要SQL引擎在计算最大值时记住其他列。

6 个解决方案

#1


10  

You can't get the Id of the row that MAX found, because there might not be only one id with the maximum age.

您无法获得MAX找到的行的ID,因为可能不会只有一个具有最大年龄的ID。

#2


4  

You cannot (should not) put non-aggregates in the SELECT line of a GROUP BY query.

您不能(不应该)将非聚合放在GROUP BY查询的SELECT行中。

You can, and have to, define what you are grouping by for the aggregate function to return the correct result.

您可以并且必须定义要为聚合函数分组的内容以返回正确的结果。

MySQL (and SQLite) decided in their infinite wisdom that they would go against spec, and allow queries to accept GROUP BY clauses missing columns quoted in the SELECT - it effectively makes these queries not portable.

MySQL(和SQLite)以他们无限的智慧决定他们会违反规范,并允许查询接受SELECT BY子句缺少SELECT中引用的列 - 它有效地使这些查询不可移植。

It really seems like there should be away to get this information without needing to join.

看起来似乎应该离开以获取此信息而无需加入。

Without access to the analytic/ranking/windowing functions that MySQL doesn't support, the self join to a derived table/inline view is the most portable means of getting the result you desire.

如果无法访问MySQL不支持的分析/排名/窗口函数,则自联接到派生表/内联视图是获取所需结果的最便携方式。

#3


2  

I think it's tempting indeed to ask the system to solve the problem in one pass rather than having to do the job twice (find the max, and the find the corresponding id). You can do using CONCAT (as suggested in Naktibalda refered article), not sure that would be more effeciant

我认为确实要求系统在一次通过中解决问题而不是两次完成工作(找到最大值,并找到相应的id)是很诱人的。您可以使用CONCAT(如Naktibalda所提到的文章中所建议的那样),不确定它会更有效

SELECT MAX( CONCAT( LPAD(age, 10, '0'), '-', id)
FROM STUFF1
GROUP BY kind;

Should work, you have to split the answer to get the age and the id. (That's really ugly though)

应该工作,你必须分开答案,以获得年龄和身份。 (虽然这真的很难看)

#4


2  

In recent databases you can use sum() over (parition by ...) to solve this problem:

在最近的数据库中,您可以使用sum()over(parition by ...)来解决此问题:

select id, kind, age as max_age from (
  select id, kind, age, max(age) over (partition by kind) as mage
    from table)
where age = mage

This can then be single pass

这可以是单程

#5


1  

You have to have a join because the aggregate function max retrieves many rows and chooses the max. So you need a join to choose the one that the agregate function has found.

您必须有一个连接,因为聚合函数max检索许多行并选择最大值。所以你需要一个联接来选择agregate函数找到的那个。

To put it a different way how would you expect the query to behave if you replaced max with sum?

换句话说,如果用sum替换max,你会如何看待查询的行为?

An inner join might be more efficient than your sub query though.

内部联接可能比您的子查询更有效。

#6


0  

PostgesSQL's DISTINCT ON will be useful here.

PostgesSQL的DISTINCT ON在这里很有用。

SELECT DISTINCT ON (kind) kind, id, age 
FROM stuff
ORDER BY kind, age DESC;

This groups by kind and returns the first row in the ordered format. As we have ordered by age in descending order, we will get the row with max age for kind.

这按类分组并返回有序格式的第一行。由于我们按年龄顺序按顺序排序,我们将得到最大种类的行。

P.S. columns in DISTINCT ON should appear first in order by

附: DISTINCT ON中的列应首先按顺序出现

#1


10  

You can't get the Id of the row that MAX found, because there might not be only one id with the maximum age.

您无法获得MAX找到的行的ID,因为可能不会只有一个具有最大年龄的ID。

#2


4  

You cannot (should not) put non-aggregates in the SELECT line of a GROUP BY query.

您不能(不应该)将非聚合放在GROUP BY查询的SELECT行中。

You can, and have to, define what you are grouping by for the aggregate function to return the correct result.

您可以并且必须定义要为聚合函数分组的内容以返回正确的结果。

MySQL (and SQLite) decided in their infinite wisdom that they would go against spec, and allow queries to accept GROUP BY clauses missing columns quoted in the SELECT - it effectively makes these queries not portable.

MySQL(和SQLite)以他们无限的智慧决定他们会违反规范,并允许查询接受SELECT BY子句缺少SELECT中引用的列 - 它有效地使这些查询不可移植。

It really seems like there should be away to get this information without needing to join.

看起来似乎应该离开以获取此信息而无需加入。

Without access to the analytic/ranking/windowing functions that MySQL doesn't support, the self join to a derived table/inline view is the most portable means of getting the result you desire.

如果无法访问MySQL不支持的分析/排名/窗口函数,则自联接到派生表/内联视图是获取所需结果的最便携方式。

#3


2  

I think it's tempting indeed to ask the system to solve the problem in one pass rather than having to do the job twice (find the max, and the find the corresponding id). You can do using CONCAT (as suggested in Naktibalda refered article), not sure that would be more effeciant

我认为确实要求系统在一次通过中解决问题而不是两次完成工作(找到最大值,并找到相应的id)是很诱人的。您可以使用CONCAT(如Naktibalda所提到的文章中所建议的那样),不确定它会更有效

SELECT MAX( CONCAT( LPAD(age, 10, '0'), '-', id)
FROM STUFF1
GROUP BY kind;

Should work, you have to split the answer to get the age and the id. (That's really ugly though)

应该工作,你必须分开答案,以获得年龄和身份。 (虽然这真的很难看)

#4


2  

In recent databases you can use sum() over (parition by ...) to solve this problem:

在最近的数据库中,您可以使用sum()over(parition by ...)来解决此问题:

select id, kind, age as max_age from (
  select id, kind, age, max(age) over (partition by kind) as mage
    from table)
where age = mage

This can then be single pass

这可以是单程

#5


1  

You have to have a join because the aggregate function max retrieves many rows and chooses the max. So you need a join to choose the one that the agregate function has found.

您必须有一个连接,因为聚合函数max检索许多行并选择最大值。所以你需要一个联接来选择agregate函数找到的那个。

To put it a different way how would you expect the query to behave if you replaced max with sum?

换句话说,如果用sum替换max,你会如何看待查询的行为?

An inner join might be more efficient than your sub query though.

内部联接可能比您的子查询更有效。

#6


0  

PostgesSQL's DISTINCT ON will be useful here.

PostgesSQL的DISTINCT ON在这里很有用。

SELECT DISTINCT ON (kind) kind, id, age 
FROM stuff
ORDER BY kind, age DESC;

This groups by kind and returns the first row in the ordered format. As we have ordered by age in descending order, we will get the row with max age for kind.

这按类分组并返回有序格式的第一行。由于我们按年龄顺序按顺序排序,我们将得到最大种类的行。

P.S. columns in DISTINCT ON should appear first in order by

附: DISTINCT ON中的列应首先按顺序出现