为什么不能在一个选择中混合聚合值和非聚合值呢?

时间:2021-11-30 01:28:40

I know that if you have one aggregate function in a SELECT statement, then all the other values in the statement must be either aggregate functions, or listed in a GROUP BY clause. I don't understand why that's the case.

我知道,如果SELECT语句中有一个聚合函数,那么语句中的所有其他值必须是聚合函数,或者列在GROUP BY子句中。我不明白为什么会这样。

If I do:

如果我做的事:

SELECT Name, 'Jones' AS Surname FROM People

I get:

我得到:

NAME    SURNAME
Dave    Jones
Susan   Jones
Amy     Jones

So, the DBMS has taken a value from each row, and appended a single value to it in the result set. That's fine. But if that works, why can't I do:

所以,DBMS从每一行中取了一个值,并在结果集中添加了一个值。但如果那行得通,我为什么不能:

SELECT Name, COUNT(Name) AS Surname FROM People

It seems like the same idea, take a value from each row and append a single value. But instead of:

这似乎是相同的想法,从每一行取一个值并附加一个值。但不是:

NAME    SURNAME
Dave    3
Susan   3
Amy     3    

I get:

我得到:

You tried to execute a query that does not include the specified expression 'ContactName' as part of an aggregate function.

您试图执行一个查询,该查询不将指定的表达式“ContactName”作为聚合函数的一部分。

I know it's not allowed, but the two circumstances seem so similar that I don't understand why. Is it to make the DBMS easier to implement? If anyone can explain to me why it doesn't work like I think it should, I'd be very grateful.

我知道这是不允许的,但这两种情况似乎很相似,我不明白为什么。是为了让DBMS更容易实现吗?如果有人能向我解释为什么它不像我认为的那样有效,我会非常感激。

6 个解决方案

#1


11  

Aggregates doesn't work on a complete result, they only work on a group in a result.

聚合不能完成一个完整的结果,它们只能在一个结果中对一个组进行工作。

Consider a table containing:

考虑一个表包含:

Person   Pet
-------- --------
Amy      Cat
Amy      Dog
Amy      Canary
Dave     Dog
Susan    Snake
Susan    Spider

If you use a query that groups on Person, it will divide the data into these groups:

如果您使用对Person进行分组的查询,它会将数据分为以下几个组:

Amy:
  Amy    Cat
  Amy    Dog
  Amy    Canary
Dave:
  Dave   Dog
Susan:
  Susan  Snake
  Susan  Spider

If you use an aggreage, for exmple the count aggregate, it will produce one result for each group:

如果您使用聚合,对计数聚合进行exmple,它将为每个组生成一个结果:

Amy:
  Amy    Cat
  Amy    Dog
  Amy    Canary    count(*) = 3
Dave:
  Dave   Dog       count(*) = 1
Susan:
  Susan  Snake
  Susan  Spider    count(*) = 2

So, the query select Person, count(*) from People group by Person gives you one record for each group:

所以,查询select Person, count(*) from People group by Person为每个组提供一条记录:

Amy    3
Dave   1
Susan  2

If you try to get the Pet field in the result also, that doesn't work because there may be multiple values for that field in each group.

如果您试图在结果中也获取Pet字段,那将不起作用,因为在每个组中该字段可能有多个值。

(Some databases, like MySQL, does allow that anyway, and just returns any random value from within the group, and it's your responsibility to know if the result is sensible or not.)

(有些数据库,比如MySQL,无论如何都允许这样做,并且只从组内返回任何随机值,您有责任知道结果是否合理。)

If you use an aggregate, but doesn't specify any grouping, the query will still be grouped, and the entire result is a single group. So the query select count(*) from Person will create a single group containing all records, and the aggregate can count the records in that group. The result contains one row from each group, and as there is only one group, there will be one row in the result.

如果使用聚合,但不指定任何分组,查询仍将被分组,整个结果是一个组。因此,Person的查询select count(*)将创建一个包含所有记录的组,聚合可以对该组中的记录进行计数。结果包含来自每个组的一行,由于只有一个组,因此结果中将有一行。

#2


7  

Think about it this way: when you call COUNT without grouping, it "collapses" the table to a single group making it impossible to access the individual items within a group in a select clause.

可以这样考虑:当您调用COUNT而不进行分组时,它将表“折叠”到单个组中,使您无法在select子句中访问组中的各个项。

You can still get your result using a subquery or a cross join:

您仍然可以使用子查询或交叉连接获得结果:

    SELECT p1.Name, COUNT(p2.Name) AS Surname FROM People p1 CROSS JOIN People p2 GROUP BY p1.Name

    SELECT Name, (SELECT COUNT(Name) FROM People) AS Surname FROM People

#3


5  

As others explained, when you have a GROUP BY or you are using an aggregate function like COUNT() in the SELECT list, you are doing a grouping of rows and therefore collapsing matching rows into one for every group.

正如其他人所解释的,当您在SELECT列表中有一个GROUP BY或您正在使用一个聚合函数COUNT()时,您正在对行进行分组,因此将每个组的匹配行合并为一个。

When you only use aggregate functions in the SELECT list, without GROUP BY, think of it as you have a GROUP BY 1, so all rows are grouped, collapsed into one. So, if you have a hundred rows, the database can't really show you a name as there are a hundred of them.

当您只在SELECT列表中使用聚合函数而不使用GROUP BY时,可以将其视为一个GROUP BY 1,这样所有的行都被分组,合并为一个。所以,如果有100行,数据库就不能显示名称,因为有100行。

However, for RDBMSs that have "windowing" functions, what you want is feasible. E.g. use aggregate functions without a GROUP BY.

但是,对于具有“窗口”功能的rdbms来说,您想要的是可行的。例如:使用聚合函数而不使用GROUP BY。

Example for SQL-Server, where all rows (names) in the table are counted:

例如SQL-Server,表中的所有行(名称)都被计数:

SELECT Name
     , COUNT(*) OVER() AS cnt
FROM People

How does the above work?

上述方法是如何工作的?

  • It shows the Name like the COUNT(*) OVER() AS cnt did not exist and

    它显示了COUNT(*) /()这样的名称,因为cnt不存在

  • It shows the COUNT(*) like if it was making a total grouping of the table.

    它显示COUNT(*),就像它在对表进行总体分组一样。


Another example. If you have a Surname field on the table, you can have something like this to show all rows grouped by Surname and counting how many people have same Surname:

另一个例子。如果你在表格上有一个姓域,你可以用这样的东西来显示按姓氏分组的所有行并计算有多少人有相同的姓:

SELECT Name
     , Surname
     , COUNT(*) OVER(PARTITION BY Surname) AS cnt
FROM People

#4


2  

Your query implicitly asks for different types of rows in your result set, and that is not allowed. All rows returned should be of the same type and have the same kind of columns.

查询在结果集中隐式地询问不同类型的行,这是不允许的。所有返回的行应该是相同类型的,并且具有相同类型的列。

'SELECT name, surname' wants to returns a row for every row in the table.

“选择名称,姓”希望为表中的每一行返回一行。

'SELECT COUNT(*)' wants to return a single row combining the results of all the rows in the table.

'SELECT COUNT(*)'希望返回一个包含表中所有行的结果的行。

I think you're correct that in this case the database could plausibly just do both queries and then copy the result of 'SELECT COUNT(*)' into every result. One reason for not doing this is that it would be a stealth performance hit: you'd effectively be doing an extra self-join without declaring it anywhere.

我认为您是对的,在这种情况下,数据库可以只执行两个查询,然后将“SELECT COUNT(*)”的结果复制到每个结果中。不这样做的一个原因是这将是一个秘密的性能打击:您将有效地做一个额外的自连接,而没有在任何地方声明它。

Other answers have explained how to write a working version of this query, so I won't go into that.

其他的答案已经解释了如何编写这个查询的工作版本,所以我就不深入了。

#5


1  

The aggregate function and the group by clause aren't separate things, they're parts of the same thing that appear in different places in the query. If you wish to aggregate on a column, you must say what function to use for aggregation; if you wish to have an aggregation function, it has to be applied over some column.

聚合函数和group by子句不是分开的东西,它们是出现在查询中的不同位置的相同内容的一部分。如果希望在列上聚合,则必须说明用于聚合的函数;如果您希望有一个聚合函数,那么它必须应用到某个列上。

#6


1  

The aggregate function takes values from multiple rows with a specific condition and combines them into one value. This condition is defined by the GROUP BYin your statement. So you can't use an aggregate function without a GROUP BY

聚合函数从具有特定条件的多个行中获取值,并将它们组合为一个值。此条件由GROUP BYin语句定义。如果没有GROUP BY,就不能使用聚合函数

With

SELECT Name, 'Jones' AS Surname FROM People  

you simply select an additional column with a fixed value... but with

您只需选择一个具有固定值的附加列……但随着

SELECT Name, COUNT(Name) AS Surname FROM People GROUP BY Name

you tell the DBMS to select the Names, remember how often every Name occured in the table and collapse them into one row. So if you omit the GROUP BY the DBMS can't tell, how to collapse the records

告诉DBMS选择名称,记住每个名称在表中出现的频率,并将它们折叠成一行。所以如果你省略了数据库管理系统不能告诉的组,如何折叠记录

#1


11  

Aggregates doesn't work on a complete result, they only work on a group in a result.

聚合不能完成一个完整的结果,它们只能在一个结果中对一个组进行工作。

Consider a table containing:

考虑一个表包含:

Person   Pet
-------- --------
Amy      Cat
Amy      Dog
Amy      Canary
Dave     Dog
Susan    Snake
Susan    Spider

If you use a query that groups on Person, it will divide the data into these groups:

如果您使用对Person进行分组的查询,它会将数据分为以下几个组:

Amy:
  Amy    Cat
  Amy    Dog
  Amy    Canary
Dave:
  Dave   Dog
Susan:
  Susan  Snake
  Susan  Spider

If you use an aggreage, for exmple the count aggregate, it will produce one result for each group:

如果您使用聚合,对计数聚合进行exmple,它将为每个组生成一个结果:

Amy:
  Amy    Cat
  Amy    Dog
  Amy    Canary    count(*) = 3
Dave:
  Dave   Dog       count(*) = 1
Susan:
  Susan  Snake
  Susan  Spider    count(*) = 2

So, the query select Person, count(*) from People group by Person gives you one record for each group:

所以,查询select Person, count(*) from People group by Person为每个组提供一条记录:

Amy    3
Dave   1
Susan  2

If you try to get the Pet field in the result also, that doesn't work because there may be multiple values for that field in each group.

如果您试图在结果中也获取Pet字段,那将不起作用,因为在每个组中该字段可能有多个值。

(Some databases, like MySQL, does allow that anyway, and just returns any random value from within the group, and it's your responsibility to know if the result is sensible or not.)

(有些数据库,比如MySQL,无论如何都允许这样做,并且只从组内返回任何随机值,您有责任知道结果是否合理。)

If you use an aggregate, but doesn't specify any grouping, the query will still be grouped, and the entire result is a single group. So the query select count(*) from Person will create a single group containing all records, and the aggregate can count the records in that group. The result contains one row from each group, and as there is only one group, there will be one row in the result.

如果使用聚合,但不指定任何分组,查询仍将被分组,整个结果是一个组。因此,Person的查询select count(*)将创建一个包含所有记录的组,聚合可以对该组中的记录进行计数。结果包含来自每个组的一行,由于只有一个组,因此结果中将有一行。

#2


7  

Think about it this way: when you call COUNT without grouping, it "collapses" the table to a single group making it impossible to access the individual items within a group in a select clause.

可以这样考虑:当您调用COUNT而不进行分组时,它将表“折叠”到单个组中,使您无法在select子句中访问组中的各个项。

You can still get your result using a subquery or a cross join:

您仍然可以使用子查询或交叉连接获得结果:

    SELECT p1.Name, COUNT(p2.Name) AS Surname FROM People p1 CROSS JOIN People p2 GROUP BY p1.Name

    SELECT Name, (SELECT COUNT(Name) FROM People) AS Surname FROM People

#3


5  

As others explained, when you have a GROUP BY or you are using an aggregate function like COUNT() in the SELECT list, you are doing a grouping of rows and therefore collapsing matching rows into one for every group.

正如其他人所解释的,当您在SELECT列表中有一个GROUP BY或您正在使用一个聚合函数COUNT()时,您正在对行进行分组,因此将每个组的匹配行合并为一个。

When you only use aggregate functions in the SELECT list, without GROUP BY, think of it as you have a GROUP BY 1, so all rows are grouped, collapsed into one. So, if you have a hundred rows, the database can't really show you a name as there are a hundred of them.

当您只在SELECT列表中使用聚合函数而不使用GROUP BY时,可以将其视为一个GROUP BY 1,这样所有的行都被分组,合并为一个。所以,如果有100行,数据库就不能显示名称,因为有100行。

However, for RDBMSs that have "windowing" functions, what you want is feasible. E.g. use aggregate functions without a GROUP BY.

但是,对于具有“窗口”功能的rdbms来说,您想要的是可行的。例如:使用聚合函数而不使用GROUP BY。

Example for SQL-Server, where all rows (names) in the table are counted:

例如SQL-Server,表中的所有行(名称)都被计数:

SELECT Name
     , COUNT(*) OVER() AS cnt
FROM People

How does the above work?

上述方法是如何工作的?

  • It shows the Name like the COUNT(*) OVER() AS cnt did not exist and

    它显示了COUNT(*) /()这样的名称,因为cnt不存在

  • It shows the COUNT(*) like if it was making a total grouping of the table.

    它显示COUNT(*),就像它在对表进行总体分组一样。


Another example. If you have a Surname field on the table, you can have something like this to show all rows grouped by Surname and counting how many people have same Surname:

另一个例子。如果你在表格上有一个姓域,你可以用这样的东西来显示按姓氏分组的所有行并计算有多少人有相同的姓:

SELECT Name
     , Surname
     , COUNT(*) OVER(PARTITION BY Surname) AS cnt
FROM People

#4


2  

Your query implicitly asks for different types of rows in your result set, and that is not allowed. All rows returned should be of the same type and have the same kind of columns.

查询在结果集中隐式地询问不同类型的行,这是不允许的。所有返回的行应该是相同类型的,并且具有相同类型的列。

'SELECT name, surname' wants to returns a row for every row in the table.

“选择名称,姓”希望为表中的每一行返回一行。

'SELECT COUNT(*)' wants to return a single row combining the results of all the rows in the table.

'SELECT COUNT(*)'希望返回一个包含表中所有行的结果的行。

I think you're correct that in this case the database could plausibly just do both queries and then copy the result of 'SELECT COUNT(*)' into every result. One reason for not doing this is that it would be a stealth performance hit: you'd effectively be doing an extra self-join without declaring it anywhere.

我认为您是对的,在这种情况下,数据库可以只执行两个查询,然后将“SELECT COUNT(*)”的结果复制到每个结果中。不这样做的一个原因是这将是一个秘密的性能打击:您将有效地做一个额外的自连接,而没有在任何地方声明它。

Other answers have explained how to write a working version of this query, so I won't go into that.

其他的答案已经解释了如何编写这个查询的工作版本,所以我就不深入了。

#5


1  

The aggregate function and the group by clause aren't separate things, they're parts of the same thing that appear in different places in the query. If you wish to aggregate on a column, you must say what function to use for aggregation; if you wish to have an aggregation function, it has to be applied over some column.

聚合函数和group by子句不是分开的东西,它们是出现在查询中的不同位置的相同内容的一部分。如果希望在列上聚合,则必须说明用于聚合的函数;如果您希望有一个聚合函数,那么它必须应用到某个列上。

#6


1  

The aggregate function takes values from multiple rows with a specific condition and combines them into one value. This condition is defined by the GROUP BYin your statement. So you can't use an aggregate function without a GROUP BY

聚合函数从具有特定条件的多个行中获取值,并将它们组合为一个值。此条件由GROUP BYin语句定义。如果没有GROUP BY,就不能使用聚合函数

With

SELECT Name, 'Jones' AS Surname FROM People  

you simply select an additional column with a fixed value... but with

您只需选择一个具有固定值的附加列……但随着

SELECT Name, COUNT(Name) AS Surname FROM People GROUP BY Name

you tell the DBMS to select the Names, remember how often every Name occured in the table and collapse them into one row. So if you omit the GROUP BY the DBMS can't tell, how to collapse the records

告诉DBMS选择名称,记住每个名称在表中出现的频率,并将它们折叠成一行。所以如果你省略了数据库管理系统不能告诉的组,如何折叠记录