Let's assume I have a table called Cars
with 2 columns: CarName
, BrandName
假设我有一个名为Cars with two columns的表:CarName,BrandName
Now I want to execute this query:
现在我想执行这个查询:
select CarName
from Cars
order by BrandName
As you can see, I'd like to return a list, which is sorted by a column, that is not present in the select part of the query.
如您所见,我想返回一个列表,该列表按列排序,在查询的选择部分中不存在。
The basic (not optimized) execution sequence of sql commands is: from
, where
, group by
, having
, select
, order by
.
sql命令的基本(未优化)执行顺序是:from,where,group by,having,select,order by。
The occuring problem is, that BrandName isn't part of what is left after the select command has been executed.
出现的问题是,BrandName不是执行select命令后剩下的部分。
I've searched for this in books, on google and on *, but so far I've only found several SO comments like "I know of database system that don't allow it, but I don't remeber which one".
我在书籍,谷歌和*上搜索过这个,但到目前为止,我只发现了几个SO评论,例如“我知道数据库系统不允许它,但我不记得哪一个”。
So my questions are:
1) What do the standards SQL-92 or SQL99 say about this.
2) Which databases allow this query and which don't?
所以我的问题是:1)SQL-92或SQL99标准对此有何看法。 2)哪些数据库允许此查询,哪些不允许?
(Background: A couple of students asked this, and I want to give them the best answer possible)
(背景:有几个学生问过这个,我想尽可能给他们最好的答案)
EDIT:
- Successfully tested for Microsoft SQL Server 2012
编辑: - 成功测试Microsoft SQL Server 2012
2 个解决方案
#1
7
Your query is perfectly legal syntax, you can order by columns that are not present in the select.
您的查询是完全合法的语法,您可以按选择中不存在的列进行排序。
- Working Demo with MySQL
- Working Demo with SQL Server
- Working Demo with Postgresql
- Working Demo with SQLite
- Working Demo with Oracle
使用MySQL进行演示
使用SQL Server进行演示
使用Postgresql进行演示
使用SQLite进行演示
与Oracle合作演示
If you need the full specs about legal ordering, in the SQL Standard 2003 it has a long list of statements about what the order by should and shouldn't contain, (02-Foundation, page 415, section 7.13 <Query expression>, sub part 28). This confirms that your query is legal syntax.
如果您需要有关合法排序的完整规范,则在SQL Standard 2003中有一长串关于应该和不应该包含的顺序的语句列表,(02-Foundation,第415页,第7.13节 <查询表达式> ,sub第28部分)。这确认您的查询是合法语法。
I think your confusion could be arising from selecting, and/or ordering by columns not present in the group by, or ordering by columns not in the select when using distinct.
我认为您的混淆可能来自于选择和/或按照组中不存在的列排序,或者使用distinct时不在select中的列排序。
Both have the same fundamental problem, and MySQL is the only one to my knowledge that allows either.
两者都有相同的基本问题,而MySQL是我所知道的唯一允许的问题。
The problem is this, that when using group by or distinct, any columns not contained in either are not needed, so it doesn't matter if they have multiple different values across rows because they are never needed. Imagine this simple data set:
问题在于,当使用group by或distinct时,不需要任何未包含的列,因此它们在行之间有多个不同的值并不重要,因为它们永远不需要。想象一下这个简单的数据集:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
2 | A | Z |
3 | B | Y |
If you write:
如果你写:
SELECT DISTINCT Column1
FROM T;
You would get
你会得到的
Column1
---------
A
B
If you then add ORDER BY Column2
, which of the two column2's would your use to order A by, X or Z? It is not deterministic as to how to choose a value for column2.
如果你然后添加ORDER BY Column2,你可以使用两个column2中的哪一个来命令A by,X或Z?关于如何为column2选择值,这不是确定性的。
The same applies to selecting columns not in the group by. To simplify things just imagine the first two rows of the previous table:
这同样适用于选择不在组中的列。为简化起见,只需想象上一个表的前两行:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
2 | A | Z |
In MySQL you can write
在MySQL中你可以写
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1;
This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:
这实际上打破了SQL标准,但它适用于MySQL,但问题是它是非确定性的,结果是:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
Is no more or less correct than
没有或多或少不正确
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
So what you are saying is give me one row for each distinct value of Column1
, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY
clause to influence the results, so for example the following query:
所以你要说的是为Column1的每个不同值给我一行,两个结果集都满足,所以你怎么知道你会得到哪一个?好吧你没有,似乎是一个相当流行的误解,你可以添加和ORDER BY子句来影响结果,所以例如以下查询:
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1
ORDER BY ID DESC;
Would ensure that you get the following result:
确保您获得以下结果:
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
because of the ORDER BY ID DESC
, however this is not true (as demonstrated here).
因为ORDER BY ID DESC,但这不是真的(如此处所示)。
The MySQL documents state:
MySQL文档说明:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.
服务器可以*选择每个组中的任何值,因此除非它们相同,否则所选的值是不确定的。此外,添加ORDER BY子句不会影响每个组中值的选择。
So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-determistic.
因此,即使您有一个订单,但在每个组选择了一行之后才适用,并且这一行是不确定的。
The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependant on a column in the GROUP BY. From the SQL-2003-Standard (5WD-02-Foundation-2003-09 - page 346) - http://www.wiscorp.com/sql_2003_standard.zip
SQL-Standard允许选择列表中的列不包含在GROUP BY中或聚合函数中,但是这些列必须在功能上依赖于GROUP BY中的列。从SQL-2003-Standard(5WD-02-Foundation-2003-09 - 第346页) - http://www.wiscorp.com/sql_2003_standard.zip
15) If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained in <select list> , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a <set function specification> whose aggregation query is QS.
15)如果T是分组表,那么令G为T的分组列的集合。在
For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard - Example here):
例如,示例表中的ID是PRIMARY KEY,因此我们知道它在表中是唯一的,因此以下查询符合SQL标准并且将在MySQL中运行并且当前在许多DBMS中失败(在编写Postgresql时)是我所知道的最接近正确实施标准的DBMS - 例如:
SELECT ID, Column1, Column2
FROM T
GROUP BY ID;
Since ID is unique for each row, there can only be one value of Column1
for each ID, one value of Column2
there is no ambiguity about what to return for each row.
由于ID对于每一行都是唯一的,因此每个ID只能有一个Column1值,Column2的一个值对于每行返回的内容没有歧义。
#2
1
There's no logical reason why any RDBMS wouldn't let you do this. The usual restriction relates to SELECT DISTINCT, or the presence of a GROUP BY clause.
没有合理的理由说明为什么任何RDBMS都不允许你这样做。通常的限制涉及SELECT DISTINCT,或者存在GROUP BY子句。
Current list of RDBMS known to support this:
目前已知的RDBMS列表支持这一点:
- Microsoft SQL Server 2012
- Oracle
- PostgreSQL
- MySQL
- DB2
Microsoft SQL Server 2012
#1
7
Your query is perfectly legal syntax, you can order by columns that are not present in the select.
您的查询是完全合法的语法,您可以按选择中不存在的列进行排序。
- Working Demo with MySQL
- Working Demo with SQL Server
- Working Demo with Postgresql
- Working Demo with SQLite
- Working Demo with Oracle
使用MySQL进行演示
使用SQL Server进行演示
使用Postgresql进行演示
使用SQLite进行演示
与Oracle合作演示
If you need the full specs about legal ordering, in the SQL Standard 2003 it has a long list of statements about what the order by should and shouldn't contain, (02-Foundation, page 415, section 7.13 <Query expression>, sub part 28). This confirms that your query is legal syntax.
如果您需要有关合法排序的完整规范,则在SQL Standard 2003中有一长串关于应该和不应该包含的顺序的语句列表,(02-Foundation,第415页,第7.13节 <查询表达式> ,sub第28部分)。这确认您的查询是合法语法。
I think your confusion could be arising from selecting, and/or ordering by columns not present in the group by, or ordering by columns not in the select when using distinct.
我认为您的混淆可能来自于选择和/或按照组中不存在的列排序,或者使用distinct时不在select中的列排序。
Both have the same fundamental problem, and MySQL is the only one to my knowledge that allows either.
两者都有相同的基本问题,而MySQL是我所知道的唯一允许的问题。
The problem is this, that when using group by or distinct, any columns not contained in either are not needed, so it doesn't matter if they have multiple different values across rows because they are never needed. Imagine this simple data set:
问题在于,当使用group by或distinct时,不需要任何未包含的列,因此它们在行之间有多个不同的值并不重要,因为它们永远不需要。想象一下这个简单的数据集:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
2 | A | Z |
3 | B | Y |
If you write:
如果你写:
SELECT DISTINCT Column1
FROM T;
You would get
你会得到的
Column1
---------
A
B
If you then add ORDER BY Column2
, which of the two column2's would your use to order A by, X or Z? It is not deterministic as to how to choose a value for column2.
如果你然后添加ORDER BY Column2,你可以使用两个column2中的哪一个来命令A by,X或Z?关于如何为column2选择值,这不是确定性的。
The same applies to selecting columns not in the group by. To simplify things just imagine the first two rows of the previous table:
这同样适用于选择不在组中的列。为简化起见,只需想象上一个表的前两行:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
2 | A | Z |
In MySQL you can write
在MySQL中你可以写
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1;
This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:
这实际上打破了SQL标准,但它适用于MySQL,但问题是它是非确定性的,结果是:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
Is no more or less correct than
没有或多或少不正确
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
So what you are saying is give me one row for each distinct value of Column1
, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY
clause to influence the results, so for example the following query:
所以你要说的是为Column1的每个不同值给我一行,两个结果集都满足,所以你怎么知道你会得到哪一个?好吧你没有,似乎是一个相当流行的误解,你可以添加和ORDER BY子句来影响结果,所以例如以下查询:
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1
ORDER BY ID DESC;
Would ensure that you get the following result:
确保您获得以下结果:
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
because of the ORDER BY ID DESC
, however this is not true (as demonstrated here).
因为ORDER BY ID DESC,但这不是真的(如此处所示)。
The MySQL documents state:
MySQL文档说明:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.
服务器可以*选择每个组中的任何值,因此除非它们相同,否则所选的值是不确定的。此外,添加ORDER BY子句不会影响每个组中值的选择。
So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-determistic.
因此,即使您有一个订单,但在每个组选择了一行之后才适用,并且这一行是不确定的。
The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependant on a column in the GROUP BY. From the SQL-2003-Standard (5WD-02-Foundation-2003-09 - page 346) - http://www.wiscorp.com/sql_2003_standard.zip
SQL-Standard允许选择列表中的列不包含在GROUP BY中或聚合函数中,但是这些列必须在功能上依赖于GROUP BY中的列。从SQL-2003-Standard(5WD-02-Foundation-2003-09 - 第346页) - http://www.wiscorp.com/sql_2003_standard.zip
15) If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained in <select list> , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a <set function specification> whose aggregation query is QS.
15)如果T是分组表,那么令G为T的分组列的集合。在
For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard - Example here):
例如,示例表中的ID是PRIMARY KEY,因此我们知道它在表中是唯一的,因此以下查询符合SQL标准并且将在MySQL中运行并且当前在许多DBMS中失败(在编写Postgresql时)是我所知道的最接近正确实施标准的DBMS - 例如:
SELECT ID, Column1, Column2
FROM T
GROUP BY ID;
Since ID is unique for each row, there can only be one value of Column1
for each ID, one value of Column2
there is no ambiguity about what to return for each row.
由于ID对于每一行都是唯一的,因此每个ID只能有一个Column1值,Column2的一个值对于每行返回的内容没有歧义。
#2
1
There's no logical reason why any RDBMS wouldn't let you do this. The usual restriction relates to SELECT DISTINCT, or the presence of a GROUP BY clause.
没有合理的理由说明为什么任何RDBMS都不允许你这样做。通常的限制涉及SELECT DISTINCT,或者存在GROUP BY子句。
Current list of RDBMS known to support this:
目前已知的RDBMS列表支持这一点:
- Microsoft SQL Server 2012
- Oracle
- PostgreSQL
- MySQL
- DB2
Microsoft SQL Server 2012