SQL:聚合函数和组by

时间:2021-09-17 22:58:40

Consider the Oracle emp table. I'd like to get the employees with the top salary with department = 20 and job = clerk. Also assume that there is no "empno" column, and that the primary key involves a number of columns. You can do this with:

考虑Oracle emp表。我要找的是薪水最高的部门= 20,职位=职员。还要假设没有“empno”列,主键包含许多列。你可以这样做:

select * from scott.emp
where deptno = 20 and job = 'CLERK'
and sal =  (select max(sal) from scott.emp
            where deptno = 20 and job = 'CLERK')

This works, but I have to duplicate the test deptno = 20 and job = 'CLERK', which I would like to avoid. Is there a more elegant way to write this, maybe using a group by? BTW, if this matters, I am using Oracle.

这是可行的,但我必须重复测试deptno = 20和job = '办事员',我想避免。有没有一种更优雅的方式来写这个,也许用group by?顺便说一句,如果这很重要,我正在使用Oracle。

6 个解决方案

#1


3  

The following is slightly over-engineered, but is a good SQL pattern for "top x" queries.

下面的代码稍微有些夸张,但是对于“top x”查询来说,这是一个很好的SQL模式。

SELECT 
 * 
FROM 
 scott.emp
WHERE 
 (deptno,job,sal) IN
 (SELECT 
   deptno,
   job,
   max(sal) 
  FROM 
   scott.emp
  WHERE 
   deptno = 20 
   and job = 'CLERK'
  GROUP BY 
   deptno,
   job
  )

Also note that this will work in Oracle and Postgress (i think) but not MS SQL. For something similar in MS SQL see question SQL Query to get latest price

还需要注意的是,这在Oracle和Postgress(我认为)中可以工作,但在MS SQL中不能工作。有关MS SQL中的类似内容,请参阅问题SQL查询以获取最新价格

#2


2  

If I was certain of the targeted database I'd go with Mark Nold's solution, but if you ever want some dialect agnostic SQL*, try

如果我确定了目标数据库,我将使用Mark Nold的解决方案,但是如果您想要一些方言无关的SQL*,试试。

SELECT * 
FROM scott.emp e
WHERE e.deptno = 20 
AND e.job = 'CLERK'
AND e.sal = (
  SELECT MAX(e2.sal) 
  FROM scott.emp e2
  WHERE e.deptno = e2.deptno 
  AND e.job = e2.job
)

*I believe this should work everywhere, but I don't have the environments to test it.

*我认为这应该适用于任何地方,但我没有环境来测试它。

#3


1  

In Oracle I'd do it with an analytical function, so you'd only query the emp table once :

在Oracle中,我使用的是分析函数,所以您只需要查询emp表一次:

SELECT *
  FROM (SELECT e.*, MAX (sal) OVER () AS max_sal
          FROM scott.emp e
         WHERE deptno = 20 
           AND job = 'CLERK')
 WHERE sal = max_sal

It's simpler, easier to read and more efficient.

它更简单,更容易阅读和更有效率。

If you want to modify it to list list this information for all departments, then you'll need to use the "PARTITION BY" clause in OVER:

如果您想要修改它以列出所有部门的信息,那么您将需要使用“分区BY”子句:

SELECT *
  FROM (SELECT e.*, MAX (sal) OVER (PARTITION BY deptno) AS max_sal
          FROM scott.emp e
         WHERE job = 'CLERK')
 WHERE sal = max_sal
ORDER BY deptno

#4


0  

That's great! I didn't know you could do a comparison of (x, y, z) with the result of a SELECT statement. This works great with Oracle.

太好了!我不知道你可以对(x, y, z)和SELECT语句的结果进行比较。这对Oracle非常有效。

As a side-note for other readers, the above query is missing a "=" after "(deptno,job,sal)". Maybe the Stack Overflow formatter ate it (?).

作为对其他读者的补充说明,上面的查询缺少一个“=”after (deptno,job,sal)。可能堆栈溢出格式化程序将它(?)

Again, thanks Mark.

再一次,谢谢。

#5


0  

In Oracle you can also use the EXISTS statement, which in some cases is faster.

在Oracle中,您还可以使用exist语句,在某些情况下该语句速度更快。

For example... SELECT name, number FROM cust WHERE cust IN ( SELECT cust_id FROM big_table ) AND entered > SYSDATE -1 would be slow.

例如……选择name, cust中的number(从big_table中选择cust_id),输入> SYSDATE -1会很慢。

but SELECT name, number FROM cust c WHERE EXISTS ( SELECT cust_id FROM big_table WHERE cust_id=c.cust_id ) AND entered > SYSDATE -1 would be very fast with proper indexing. You can also use this with multiple parameters.

但是选择名称,从cust c中选择的名称(从cust_id=c的big_table中选择cust_id)。如果使用正确的索引,输入> SYSDATE -1将非常快。您还可以在多个参数中使用它。

#6


0  

There are many solutions. You could also keep your original query layout by simply adding table aliases and joining on the column names, you would still only have DEPTNO = 20 and JOB = 'CLERK' in the query once.

有许多解决方案。您还可以通过简单地添加表别名和加入列名来保持原来的查询布局,您仍然只在查询中有DEPTNO = 20和JOB = 'CLERK'。

SELECT 
  * 
FROM 
  scott.emp emptbl
WHERE
  emptbl.DEPTNO = 20 
  AND emptbl.JOB = 'CLERK'
  AND emptbl.SAL =  
    (
      select 
        max(salmax.SAL) 
      from 
        scott.emp salmax
      where 
        salmax.DEPTNO = emptbl.DEPTNO
        AND salmax.JOB = emptbl.JOB
    )

It could also be noted that the key word "ALL" can be used for these types of queries which would allow you to remove the "MAX" function.

还可以注意到,关键字“ALL”可以用于这些类型的查询,从而可以删除“MAX”函数。

SELECT 
  * 
FROM 
  scott.emp emptbl
WHERE
  emptbl.DEPTNO = 20 
  AND emptbl.JOB = 'CLERK'
  AND emptbl.SAL >= ALL  
    (
      select 
        salmax.SAL
      from 
        scott.emp salmax
      where 
        salmax.DEPTNO = emptbl.DEPTNO
        AND salmax.JOB = emptbl.JOB
    )

I hope that helps and makes sense.

我希望这有帮助,也有意义。

#1


3  

The following is slightly over-engineered, but is a good SQL pattern for "top x" queries.

下面的代码稍微有些夸张,但是对于“top x”查询来说,这是一个很好的SQL模式。

SELECT 
 * 
FROM 
 scott.emp
WHERE 
 (deptno,job,sal) IN
 (SELECT 
   deptno,
   job,
   max(sal) 
  FROM 
   scott.emp
  WHERE 
   deptno = 20 
   and job = 'CLERK'
  GROUP BY 
   deptno,
   job
  )

Also note that this will work in Oracle and Postgress (i think) but not MS SQL. For something similar in MS SQL see question SQL Query to get latest price

还需要注意的是,这在Oracle和Postgress(我认为)中可以工作,但在MS SQL中不能工作。有关MS SQL中的类似内容,请参阅问题SQL查询以获取最新价格

#2


2  

If I was certain of the targeted database I'd go with Mark Nold's solution, but if you ever want some dialect agnostic SQL*, try

如果我确定了目标数据库,我将使用Mark Nold的解决方案,但是如果您想要一些方言无关的SQL*,试试。

SELECT * 
FROM scott.emp e
WHERE e.deptno = 20 
AND e.job = 'CLERK'
AND e.sal = (
  SELECT MAX(e2.sal) 
  FROM scott.emp e2
  WHERE e.deptno = e2.deptno 
  AND e.job = e2.job
)

*I believe this should work everywhere, but I don't have the environments to test it.

*我认为这应该适用于任何地方,但我没有环境来测试它。

#3


1  

In Oracle I'd do it with an analytical function, so you'd only query the emp table once :

在Oracle中,我使用的是分析函数,所以您只需要查询emp表一次:

SELECT *
  FROM (SELECT e.*, MAX (sal) OVER () AS max_sal
          FROM scott.emp e
         WHERE deptno = 20 
           AND job = 'CLERK')
 WHERE sal = max_sal

It's simpler, easier to read and more efficient.

它更简单,更容易阅读和更有效率。

If you want to modify it to list list this information for all departments, then you'll need to use the "PARTITION BY" clause in OVER:

如果您想要修改它以列出所有部门的信息,那么您将需要使用“分区BY”子句:

SELECT *
  FROM (SELECT e.*, MAX (sal) OVER (PARTITION BY deptno) AS max_sal
          FROM scott.emp e
         WHERE job = 'CLERK')
 WHERE sal = max_sal
ORDER BY deptno

#4


0  

That's great! I didn't know you could do a comparison of (x, y, z) with the result of a SELECT statement. This works great with Oracle.

太好了!我不知道你可以对(x, y, z)和SELECT语句的结果进行比较。这对Oracle非常有效。

As a side-note for other readers, the above query is missing a "=" after "(deptno,job,sal)". Maybe the Stack Overflow formatter ate it (?).

作为对其他读者的补充说明,上面的查询缺少一个“=”after (deptno,job,sal)。可能堆栈溢出格式化程序将它(?)

Again, thanks Mark.

再一次,谢谢。

#5


0  

In Oracle you can also use the EXISTS statement, which in some cases is faster.

在Oracle中,您还可以使用exist语句,在某些情况下该语句速度更快。

For example... SELECT name, number FROM cust WHERE cust IN ( SELECT cust_id FROM big_table ) AND entered > SYSDATE -1 would be slow.

例如……选择name, cust中的number(从big_table中选择cust_id),输入> SYSDATE -1会很慢。

but SELECT name, number FROM cust c WHERE EXISTS ( SELECT cust_id FROM big_table WHERE cust_id=c.cust_id ) AND entered > SYSDATE -1 would be very fast with proper indexing. You can also use this with multiple parameters.

但是选择名称,从cust c中选择的名称(从cust_id=c的big_table中选择cust_id)。如果使用正确的索引,输入> SYSDATE -1将非常快。您还可以在多个参数中使用它。

#6


0  

There are many solutions. You could also keep your original query layout by simply adding table aliases and joining on the column names, you would still only have DEPTNO = 20 and JOB = 'CLERK' in the query once.

有许多解决方案。您还可以通过简单地添加表别名和加入列名来保持原来的查询布局,您仍然只在查询中有DEPTNO = 20和JOB = 'CLERK'。

SELECT 
  * 
FROM 
  scott.emp emptbl
WHERE
  emptbl.DEPTNO = 20 
  AND emptbl.JOB = 'CLERK'
  AND emptbl.SAL =  
    (
      select 
        max(salmax.SAL) 
      from 
        scott.emp salmax
      where 
        salmax.DEPTNO = emptbl.DEPTNO
        AND salmax.JOB = emptbl.JOB
    )

It could also be noted that the key word "ALL" can be used for these types of queries which would allow you to remove the "MAX" function.

还可以注意到,关键字“ALL”可以用于这些类型的查询,从而可以删除“MAX”函数。

SELECT 
  * 
FROM 
  scott.emp emptbl
WHERE
  emptbl.DEPTNO = 20 
  AND emptbl.JOB = 'CLERK'
  AND emptbl.SAL >= ALL  
    (
      select 
        salmax.SAL
      from 
        scott.emp salmax
      where 
        salmax.DEPTNO = emptbl.DEPTNO
        AND salmax.JOB = emptbl.JOB
    )

I hope that helps and makes sense.

我希望这有帮助,也有意义。