I am trying to select the max value from one column, while grouping by another non-unique id column which has multiple duplicate values. The original database looks something like:
我试图从一列中选择最大值,同时按另一个具有多个重复值的非唯一ID列进行分组。原始数据库看起来像:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 15 | b | 8m
65789 | 1 | c | 1o
65790 | 10 | a | 7n
65790 | 26 | b | 8m
65790 | 5 | c | 1o
...
This works just fine using:
这可以正常使用:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.mukey;
Which returns a table like:
返回一个表格如下:
mukey | ComponentPercent
65789 | 20
65790 | 26
65791 | 50
65792 | 90
I want to be able to add other columns in without affecting the GROUP BY function, to include columns like name and type into the output table like:
我希望能够在不影响GROUP BY功能的情况下添加其他列,以便在输出表中包含名称和类型等列,如:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65790 | 26 | b | 8m
65791 | 50 | c | 7n
65792 | 90 | d | 7n
but it always outputs an error saying I need to use an aggregate function with select statement. How should I go about doing this?
但它总是输出一个错误,说我需要使用带有select语句的聚合函数。我该怎么做呢?
4 个解决方案
#1
17
You have yourself a greatest-n-per-group problem. This is one of the possible solutions:
你有自己最大的每组问题。这是可能的解决方案之一:
select c.mukey, c.comppct_r, c.name, c.type
from c yt
inner join(
select c.mukey, max(c.comppct_r) comppct_r
from c
group by c.mukey
) ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r
Another possible approach, same output:
另一种可能的方法,相同的输出
select c1.*
from c c1
left outer join c c2
on (c1.mukey = c2.mukey and c1.comppct_r < c2.comppct_r)
where c2.mukey is null;
There's a comprehensive and explanatory answer on the topic here: SQL Select only rows with Max Value on a Column
这里有关于该主题的全面和解释性答案:SQL仅选择列上具有最大值的行
#2
1
Any non-aggregate column should be there in Group By clause .. why??
任何非聚合列都应该在Group By子句中。为什么?
t1
x1 y1 z1
1 2 5
2 2 7
Now you are trying to write a query like:
现在您正在尝试编写如下查询:
select x1,y1,max(z1) from t1 group by y1;
Now this query will result only one row, but what should be the value of x1?? This is basically an undefined behaviour. To overcome this, SQL will error out this query.
现在这个查询只会产生一行,但x1的值应该是多少?这基本上是一种未定义的行为。为了解决这个问题,SQL将错误地输出此查询。
Now, coming to the point, you can either chose aggregate function for x1 or you can add x1 to group by. Note that this all depends on your requirement.
现在,到了这一点,您可以为x1选择聚合函数,也可以将x1添加到group by。请注意,这一切都取决于您的要求。
If you want all rows with aggregation on z1 grouping by y1, you may use SubQ approach.
如果您希望所有具有聚合的行按z1分组,则可以使用SubQ方法。
Select x1,y1,(select max(z1) from t1 where tt.y1=y1 group by y1)
from t1 tt;
This will produce a result like:
这将产生如下结果:
t1
x1 y1 max(z1)
1 2 7
2 2 7
#3
0
Try using a virtual table as follows:
尝试使用虚拟表,如下所示:
SELECT vt.*,c.name FROM(
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke;
) as VT, c
WHERE VT.mukey = c.mukey
#4
0
You can't just add additional columns without adding them to the GROUP BY
or applying an aggregate function. The reason for that is, that the values of a column can be different inside one group. For example, you could have two rows:
您不能只添加其他列而不将它们添加到GROUP BY或应用聚合函数。原因是,一列内的列值可能不同。例如,您可以有两行:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 20 | b | 9f
How should the aggregated group look like for the columns name
and type
?
对于列名称和类型,聚合组应该如何?
If name and type is always the same inside a group, just add it to the GROUP BY
clause:
如果组内的名称和类型始终相同,只需将其添加到GROUP BY子句:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke, c.name, c.type;
#1
17
You have yourself a greatest-n-per-group problem. This is one of the possible solutions:
你有自己最大的每组问题。这是可能的解决方案之一:
select c.mukey, c.comppct_r, c.name, c.type
from c yt
inner join(
select c.mukey, max(c.comppct_r) comppct_r
from c
group by c.mukey
) ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r
Another possible approach, same output:
另一种可能的方法,相同的输出
select c1.*
from c c1
left outer join c c2
on (c1.mukey = c2.mukey and c1.comppct_r < c2.comppct_r)
where c2.mukey is null;
There's a comprehensive and explanatory answer on the topic here: SQL Select only rows with Max Value on a Column
这里有关于该主题的全面和解释性答案:SQL仅选择列上具有最大值的行
#2
1
Any non-aggregate column should be there in Group By clause .. why??
任何非聚合列都应该在Group By子句中。为什么?
t1
x1 y1 z1
1 2 5
2 2 7
Now you are trying to write a query like:
现在您正在尝试编写如下查询:
select x1,y1,max(z1) from t1 group by y1;
Now this query will result only one row, but what should be the value of x1?? This is basically an undefined behaviour. To overcome this, SQL will error out this query.
现在这个查询只会产生一行,但x1的值应该是多少?这基本上是一种未定义的行为。为了解决这个问题,SQL将错误地输出此查询。
Now, coming to the point, you can either chose aggregate function for x1 or you can add x1 to group by. Note that this all depends on your requirement.
现在,到了这一点,您可以为x1选择聚合函数,也可以将x1添加到group by。请注意,这一切都取决于您的要求。
If you want all rows with aggregation on z1 grouping by y1, you may use SubQ approach.
如果您希望所有具有聚合的行按z1分组,则可以使用SubQ方法。
Select x1,y1,(select max(z1) from t1 where tt.y1=y1 group by y1)
from t1 tt;
This will produce a result like:
这将产生如下结果:
t1
x1 y1 max(z1)
1 2 7
2 2 7
#3
0
Try using a virtual table as follows:
尝试使用虚拟表,如下所示:
SELECT vt.*,c.name FROM(
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke;
) as VT, c
WHERE VT.mukey = c.mukey
#4
0
You can't just add additional columns without adding them to the GROUP BY
or applying an aggregate function. The reason for that is, that the values of a column can be different inside one group. For example, you could have two rows:
您不能只添加其他列而不将它们添加到GROUP BY或应用聚合函数。原因是,一列内的列值可能不同。例如,您可以有两行:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 20 | b | 9f
How should the aggregated group look like for the columns name
and type
?
对于列名称和类型,聚合组应该如何?
If name and type is always the same inside a group, just add it to the GROUP BY
clause:
如果组内的名称和类型始终相同,只需将其添加到GROUP BY子句:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke, c.name, c.type;