Does "group by" clause automatically guarantee that the results will be ordered by that key? In other words, is it enough to write:
“group by”子句是否自动保证结果将按该键排序?换句话说,写下来是否足够:
select *
from table
group by a, b, c
or does one have to write
或者是否必须写
select *
from table
group by a, b, c
order by a, b, c
I know e.g. in MySQL I don't have to, but I would like to know if I can rely on it accross the SQL implementations. Is it guaranteed?
我知道,例如在MySQL中我不必,但我想知道我是否可以依赖它在SQL实现中。有保证吗?
5 个解决方案
#1
15
group by
does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary.
group by不会对数据进行任何排序。数据库旨在尽可能快地获取数据,并在必要时进行排序。
So add the order by
if you need a guaranteed order.
因此,如果您需要保证订单,请添加订单。
#2
0
It definitely doesn't. I have experienced that, once one of my queries suddenly started to return not-ordered results, as the data in the table grows by.
绝对没有。我经历过这样的情况,一旦我的一个查询突然开始返回无序结果,因为表中的数据会逐渐增加。
#3
0
I tried it. Adventureworks db of Msdn.
我尝试过这个。 Adventuredn db of Msdn。
select HireDate, min(JobTitle)
from AdventureWorks2016CTP3.HumanResources.Employee
group by HireDate
Resuts :
结果:
2009-01-10Production Technician - WC40
2009-01-10生产技术员 - WC40
2009-01-11Application Specialist
2009-01-11应用专家
2009-01-12Assistant to the Chief Financial Officer
2009-01-12助理首席财务官
2009-01-13Production Technician - WC50<
2009-01-13生产技术员 - WC50 <
It returns sorted data of hiredate, but you don't rely on GROUP BY to SORT under any circumstances.
它返回hiredate的排序数据,但在任何情况下都不依赖GROUP BY到SORT。
for example; indexes can change this sorted data.
例如;索引可以更改此排序数据。
I added following index (hiredate, jobtitle)
我添加了以下索引(hiredate,jobtitle)
CREATE NONCLUSTERED INDEX NonClusturedIndex_Jobtitle_hireddate ON [HumanResources].[Employee]
(
[JobTitle] ASC,
[HireDate] ASC
)
Result will change with same select query;
结果将随同一个选择查询而变化;
2006-06-30 Production Technician - WC60
2006-06-30生产技术员 - WC60
2007-01-26 Marketing Assistant
2007-01-26营销助理
2007-11-11 Engineering Manager
2007-11-11工程经理
2007-12-05 Senior Tool Designer
2007-12-05资深工具设计师
2007-12-11 Tool Designer
2007-12-11工具设计师
2007-12-20 Marketing Manager
2007-12-20营销经理
2007-12-26 Production Supervisor - WC60
2007-12-26生产主管 - WC60
You can download Adventureworks2016 at the following address
您可以在以下地址下载Adventureworks2016
https://www.microsoft.com/en-us/download/details.aspx?id=49502
https://www.microsoft.com/en-us/download/details.aspx?id=49502
#4
0
An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb
group by的有效实现将通过在内部对数据进行排序来执行分组。这就是为什么一些RDBMS在分组时返回排序输出的原因。然而,SQL规范没有强制要求这种行为,所以除非RDBMS供应商明确记录,否则我不会打赌它(明天)工作。 OTOH,如果RDBMS隐式地进行排序,它也可能足够聪明,然后通过优化(远离)冗余顺序。 @jimmyb
An example using PostgreSQL proving that concept
使用PostgreSQL证明这个概念的一个例子
Creating a table with 1M records, with random dates in a day range from today - 90 and indexing by date
创建一个包含1M记录的表,其中包含从今天开始的一天中的随机日期 - 90并按日期编制索引
CREATE TABLE WITHDRAW AS
SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
md5(random()::text) AS NAM_PERSON,
(NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
(random() * 1000)::decimal(12, 2) AS NUM_VALUE
FROM generate_series(1,1000000);
CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);
Grouping by date truncated by day of month, restricting select by dates in a two days range
按日期截断按日期截取,限制在两天范围内按日期选择
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
HashAggregate (cost=11428.33..11594.13 rows=11053 width=48)
Group Key: date_trunc('DAY'::text, dat_creation)
-> Bitmap Heap Scan on withdraw w (cost=237.73..11345.44 rows=11053 width=14)
Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
-> Bitmap Index Scan on withdraw_dat_creation (cost=0.00..234.97 rows=11053 width=0)
Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
Using a larger restriction date range, it chooses to apply a SORT
使用更大的限制日期范围,它选择应用SORT
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
GroupAggregate (cost=116522.65..132918.32 rows=655827 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.65..118162.22 rows=655827 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.57 rows=655827 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
Just by adding ORDER BY 1
at the end (there is no significant difference)
只需在最后添加ORDER BY 1(没有显着差异)
GroupAggregate (cost=116522.44..132918.06 rows=655825 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.44..118162.00 rows=655825 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.56 rows=655825 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
PostgreSQL 10.3
PostgreSQL 10.3
#5
-1
It depends on the number of records. When the records are less, Group by sorted automatically. When the records are more(more than 15) it required adding Order by clause
这取决于记录的数量。当记录较少时,分组自动排序。当记录多(超过15)时,需要添加Order by子句
#1
15
group by
does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary.
group by不会对数据进行任何排序。数据库旨在尽可能快地获取数据,并在必要时进行排序。
So add the order by
if you need a guaranteed order.
因此,如果您需要保证订单,请添加订单。
#2
0
It definitely doesn't. I have experienced that, once one of my queries suddenly started to return not-ordered results, as the data in the table grows by.
绝对没有。我经历过这样的情况,一旦我的一个查询突然开始返回无序结果,因为表中的数据会逐渐增加。
#3
0
I tried it. Adventureworks db of Msdn.
我尝试过这个。 Adventuredn db of Msdn。
select HireDate, min(JobTitle)
from AdventureWorks2016CTP3.HumanResources.Employee
group by HireDate
Resuts :
结果:
2009-01-10Production Technician - WC40
2009-01-10生产技术员 - WC40
2009-01-11Application Specialist
2009-01-11应用专家
2009-01-12Assistant to the Chief Financial Officer
2009-01-12助理首席财务官
2009-01-13Production Technician - WC50<
2009-01-13生产技术员 - WC50 <
It returns sorted data of hiredate, but you don't rely on GROUP BY to SORT under any circumstances.
它返回hiredate的排序数据,但在任何情况下都不依赖GROUP BY到SORT。
for example; indexes can change this sorted data.
例如;索引可以更改此排序数据。
I added following index (hiredate, jobtitle)
我添加了以下索引(hiredate,jobtitle)
CREATE NONCLUSTERED INDEX NonClusturedIndex_Jobtitle_hireddate ON [HumanResources].[Employee]
(
[JobTitle] ASC,
[HireDate] ASC
)
Result will change with same select query;
结果将随同一个选择查询而变化;
2006-06-30 Production Technician - WC60
2006-06-30生产技术员 - WC60
2007-01-26 Marketing Assistant
2007-01-26营销助理
2007-11-11 Engineering Manager
2007-11-11工程经理
2007-12-05 Senior Tool Designer
2007-12-05资深工具设计师
2007-12-11 Tool Designer
2007-12-11工具设计师
2007-12-20 Marketing Manager
2007-12-20营销经理
2007-12-26 Production Supervisor - WC60
2007-12-26生产主管 - WC60
You can download Adventureworks2016 at the following address
您可以在以下地址下载Adventureworks2016
https://www.microsoft.com/en-us/download/details.aspx?id=49502
https://www.microsoft.com/en-us/download/details.aspx?id=49502
#4
0
An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb
group by的有效实现将通过在内部对数据进行排序来执行分组。这就是为什么一些RDBMS在分组时返回排序输出的原因。然而,SQL规范没有强制要求这种行为,所以除非RDBMS供应商明确记录,否则我不会打赌它(明天)工作。 OTOH,如果RDBMS隐式地进行排序,它也可能足够聪明,然后通过优化(远离)冗余顺序。 @jimmyb
An example using PostgreSQL proving that concept
使用PostgreSQL证明这个概念的一个例子
Creating a table with 1M records, with random dates in a day range from today - 90 and indexing by date
创建一个包含1M记录的表,其中包含从今天开始的一天中的随机日期 - 90并按日期编制索引
CREATE TABLE WITHDRAW AS
SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
md5(random()::text) AS NAM_PERSON,
(NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
(random() * 1000)::decimal(12, 2) AS NUM_VALUE
FROM generate_series(1,1000000);
CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);
Grouping by date truncated by day of month, restricting select by dates in a two days range
按日期截断按日期截取,限制在两天范围内按日期选择
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
HashAggregate (cost=11428.33..11594.13 rows=11053 width=48)
Group Key: date_trunc('DAY'::text, dat_creation)
-> Bitmap Heap Scan on withdraw w (cost=237.73..11345.44 rows=11053 width=14)
Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
-> Bitmap Index Scan on withdraw_dat_creation (cost=0.00..234.97 rows=11053 width=0)
Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
Using a larger restriction date range, it chooses to apply a SORT
使用更大的限制日期范围,它选择应用SORT
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
GroupAggregate (cost=116522.65..132918.32 rows=655827 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.65..118162.22 rows=655827 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.57 rows=655827 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
Just by adding ORDER BY 1
at the end (there is no significant difference)
只需在最后添加ORDER BY 1(没有显着差异)
GroupAggregate (cost=116522.44..132918.06 rows=655825 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.44..118162.00 rows=655825 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.56 rows=655825 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
PostgreSQL 10.3
PostgreSQL 10.3
#5
-1
It depends on the number of records. When the records are less, Group by sorted automatically. When the records are more(more than 15) it required adding Order by clause
这取决于记录的数量。当记录较少时,分组自动排序。当记录多(超过15)时,需要添加Order by子句