I've been using GROUP BY
for all types of aggregate queries over the years. Recently, I've been reverse-engineering some code that uses PARTITION BY
to perform aggregations. In reading through all the documentation I can find about PARTITION BY
, it sounds a lot like GROUP BY
, maybe with a little extra functionality added in? Are they two versions of the same general functionality, or are they something different entirely?
多年来,我一直在对所有类型的聚合查询使用GROUP BY。最近,我反向工程了一些使用分区BY执行聚合的代码。在阅读我能找到的关于PARTITION BY的所有文档时,它听起来很像GROUP BY,可能会添加一些额外的功能?它们是同一个通用功能的两个版本,还是完全不同?
10 个解决方案
#1
290
They're used in different places. group by
modifies the entire query, like:
它们在不同的地方使用。通过修改整个查询,例如:
select customerId, count(*) as orderCount
from Orders
group by customerId
But partition by
just works on a window function, like row_number
:
但是分区by只作用于窗口函数,比如row_number:
select row_number() over (partition by customerId order by orderId)
as OrderNumberForThisCustomer
from Orders
A group by
normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. partition by
does not affect the number of rows returned, but it changes how a window function's result is calculated.
通常,一个组通过将它们卷起来并计算每一行的平均值或总数来减少返回的行数。分区by不会影响返回的行数,但它会改变计算窗口函数结果的方式。
#2
144
We can take a simple example
我们可以举一个简单的例子
we have a table named TableA
with the following values .
我们有一个名为TableA的表,其值如下。
id firstname lastname Mark
-------------------------------------------------------------------
1 arun prasanth 40
2 ann antony 45
3 sruthy abc 41
6 new abc 47
1 arun prasanth 45
1 arun prasanth 49
2 ann antony 49
Group By
集团
The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.
可以在SELECT语句中使用SQL GROUP BY子句收集跨多个记录的数据,并将结果按一个或多个列分组。
In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.
在更简单的词中,GROUP BY语句与聚合函数一起使用,将结果集按一个或多个列进行分组。
syntax :
语法:
SELECT expression1, expression2, ... expression_n,
aggregate_function (aggregate_expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n;
We can apply GroupBy in our table
我们可以在表格中应用GroupBy
select SUM(Mark)marksum,firstname from TableA
group by id,firstName
Results :
结果:
marksum firstname
----------------
94 ann
134 arun
47 new
41 sruthy
In our real table we have 7 rows and when we apply group by id, the server group the results based on id
在实际的表中,我们有7行,当我们按id应用组时,服务器根据id对结果进行分组
In simple words
用简单的单词
here group by normally reduces the number of rows returned by rolling them up and calculating Sum for each row.
在这里,group by通常通过将它们卷起来并计算每一行的总和来减少返回的行数。
partition by
分区的
before going to partition by
在去分区之前
let us look at OVER clause
让我们看一下上面的条款
As per MSDN definition
根据MSDN的定义
OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.
OVER子句在查询结果集中定义一个窗口或用户指定的一组行,然后窗口函数为窗口中的每一行计算一个值。可以使用OVER子句使用函数来计算聚合值,例如移动平均、累积聚合、运行总数,或每组结果的前N。
partition by will not reduce the number of rows returned
分区by不会减少返回的行数
we can apply partition by in our example table
我们可以在示例表中应用partition by
select SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname from TableA
result :
结果:
marksum firstname
-------------------
134 arun
134 arun
134 arun
94 ann
94 ann
41 sruthy
47 new
look at the results it will partition the rows and results all rows not like group by.
看看它将对行进行划分的结果,并得出与group by不同的所有行。
#3
42
partition by
doesn't actually roll up the data. It allows you to reset something on a per group basis. For example, you can get an ordinal column within a group by partitioning on the grouping field and using rownum()
over the rows within that group. This gives you something that behaves a bit like an identity column that resets at the beginning of each group.
分区by实际上并没有卷起数据。它允许你在每个组的基础上重置一些东西。例如,您可以在分组字段上进行分区,并在组内的行上使用rownum(),从而获得组中的序号列。这给了您一些行为有点像标识列的东西,标识列在每个组的开头重置。
#4
33
PARTITION BY Divides the result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.
通过将结果集划分为分区进行分区。窗口函数分别应用于每个分区,每个分区的计算重新启动。
Found at this link: OVER Clause
在这个链接中找到:OVER子句
#5
23
It provides rolled-up data without rolling up
它提供了卷起来的数据,而没有卷起来。
i.e. Suppose I want to return the relative position of sales region
例如,假设我想返回销售区域的相对位置
Using PARTITION BY, I can return the sales amount for a given region and the MAX amount across all sales regions in the same row.
使用PARTITION BY,我可以返回给定区域的销售额和同一行的所有销售区域的最大销售额。
This does mean you will have repeating data, but it may suit the end consumer in the sense that data has been aggregated but no data has been lost - as would be the case with GROUP BY.
这确实意味着您将拥有重复的数据,但它可能适合终端消费者,因为数据已经被聚合,但没有数据丢失——就像GROUP BY的情况一样。
#6
21
PARTITION BY
is analytic, while GROUP BY
is aggregate. In order to use PARTITION BY
, you have to contain it with an OVER clause.
按是解析的,按是聚合的。为了使用PARTITION BY,必须包含一个OVER子句。
#7
18
As of my understanding Partition By is almost identical to Group By, but with the following differences:
根据我的理解,划分By与Group By几乎相同,但有以下区别:
That group by actually groups the result set returning one row per group, which results therefore in SQL Server only allowing in the SELECT list aggregate functions or columns that are part of the group by clause (in which case SQL Server can guarantee that there are unique results for each group).
这组的组每组结果集返回一行,因此导致在SQL Server中只允许选择聚合函数或列列表group by子句的一部分(在这种情况下,SQL服务器可以保证有独特的结果为每个组)。
Consider for example MySQL which allows to have in the SELECT list columns that are not defined in the Group By clause, in which case one row is still being returned per group, however if the column doesn't have unique results then there is no guarantee what will be the output!
考虑例如MySQL可以选择列表中的列不是Group By子句中定义的,在这种情况下,仍被返回的每一行集团,然而如果列没有独特的结果就没有保证会输出什么!
But with Partition By, although the results of the function are identical to the results of an aggregate function with Group By, still you are getting the normal result set, which means that one is getting one row per underlying row, and not one row per group, and because of this one can have columns that are not unique per group in the SELECT list.
但与分区,尽管函数的结果是相同的结果的聚合函数组,你仍然得到正常的结果集,这意味着潜在一正一行一行,而不是每组一行,因为这一列,每组在选择列表中并不是唯一的。
So as a summary, Group By would be best when needs an output of one row per group, and Partition By would be best when one needs all the rows but still wants the aggregate function based on a group.
综上所述,当需要每个组输出一行时,Group By是最好的,而当需要所有行但仍然需要基于Group的聚合函数时,Partition By是最好的。
Of course there might also be performance issues, see http://social.msdn.microsoft.com/Forums/ms-MY/transactsql/thread/0b20c2b5-1607-40bc-b7a7-0c60a2a55fba.
当然也可能存在性能问题,请参阅http://social.msdn.microsoft.com/Forums/ms-MY/transactsql/thread/0b20c2b5-1607-40bc-b7a7-0c60a2a55fba。
#8
0
Suppose we have 14 records of name
column in table
假设表中有14条name列记录
in group by
在集团
select name,count(*) as totalcount from person where name='Please fill out' group BY name;
it will give count in single row i.e 14
它会在单行i中给出count。e 14
but in partition by
但在分区
select row_number() over (partition by name) as total from person where name = 'Please fill out';
it will 14 rows of increase in count
它将增加14行计数
#9
0
Small observation. Automation mechanism to dynamically generate SQL using the 'partition by' it is much simpler to implement in relation to the 'group by'. In the case of 'group by', We must take care of the content of 'select' column.
小的观察。使用“分区by”动态生成SQL的自动化机制,相对于“group by”实现起来要简单得多。对于“group by”,我们必须注意“select”列的内容。
Sorry for My English.
对不起,我的英语。
#10
-1
-- BELOW IS A SAMPLE WHICH OUTLINES THE SIMPLE DIFFERENCES
-- READ IT AND THEN EXECUTE IT
-- THERE ARE THREE ROWS OF EACH COLOR INSERTED INTO THE TABLE
-- CREATE A database called testDB
-- use testDB
USE [TestDB]
GO
-- create Paints table
CREATE TABLE [dbo].[Paints](
[Color] [varchar](50) NULL,
[glossLevel] [varchar](50) NULL
) ON [PRIMARY]
GO
-- Populate Table
insert into paints (color, glossLevel)
select 'red', 'eggshell'
union
select 'red', 'glossy'
union
select 'red', 'flat'
union
select 'blue', 'eggshell'
union
select 'blue', 'glossy'
union
select 'blue', 'flat'
union
select 'orange', 'glossy'
union
select 'orange', 'flat'
union
select 'orange', 'eggshell'
union
select 'green', 'eggshell'
union
select 'green', 'glossy'
union
select 'green', 'flat'
union
select 'black', 'eggshell'
union
select 'black', 'glossy'
union
select 'black', 'flat'
union
select 'purple', 'eggshell'
union
select 'purple', 'glossy'
union
select 'purple', 'flat'
union
select 'salmon', 'eggshell'
union
select 'salmon', 'glossy'
union
select 'salmon', 'flat'
/* COMPARE 'GROUP BY' color to 'OVER (PARTITION BY Color)' */
-- GROUP BY Color
-- row quantity defined by group by
-- aggregate (count(*)) defined by group by
select count(*) from paints
group by color
-- OVER (PARTITION BY... Color
-- row quantity defined by main query
-- aggregate defined by OVER-PARTITION BY
select color
, glossLevel
, count(*) OVER (Partition by color)
from paints
/* COMPARE 'GROUP BY' color, glossLevel to 'OVER (PARTITION BY Color, GlossLevel)' */
-- GROUP BY Color, GlossLevel
-- row quantity defined by GROUP BY
-- aggregate (count(*)) defined by GROUP BY
select count(*) from paints
group by color, glossLevel
-- Partition by Color, GlossLevel
-- row quantity defined by main query
-- aggregate (count(*)) defined by OVER-PARTITION BY
select color
, glossLevel
, count(*) OVER (Partition by color, glossLevel)
from paints
#1
290
They're used in different places. group by
modifies the entire query, like:
它们在不同的地方使用。通过修改整个查询,例如:
select customerId, count(*) as orderCount
from Orders
group by customerId
But partition by
just works on a window function, like row_number
:
但是分区by只作用于窗口函数,比如row_number:
select row_number() over (partition by customerId order by orderId)
as OrderNumberForThisCustomer
from Orders
A group by
normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. partition by
does not affect the number of rows returned, but it changes how a window function's result is calculated.
通常,一个组通过将它们卷起来并计算每一行的平均值或总数来减少返回的行数。分区by不会影响返回的行数,但它会改变计算窗口函数结果的方式。
#2
144
We can take a simple example
我们可以举一个简单的例子
we have a table named TableA
with the following values .
我们有一个名为TableA的表,其值如下。
id firstname lastname Mark
-------------------------------------------------------------------
1 arun prasanth 40
2 ann antony 45
3 sruthy abc 41
6 new abc 47
1 arun prasanth 45
1 arun prasanth 49
2 ann antony 49
Group By
集团
The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.
可以在SELECT语句中使用SQL GROUP BY子句收集跨多个记录的数据,并将结果按一个或多个列分组。
In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.
在更简单的词中,GROUP BY语句与聚合函数一起使用,将结果集按一个或多个列进行分组。
syntax :
语法:
SELECT expression1, expression2, ... expression_n,
aggregate_function (aggregate_expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n;
We can apply GroupBy in our table
我们可以在表格中应用GroupBy
select SUM(Mark)marksum,firstname from TableA
group by id,firstName
Results :
结果:
marksum firstname
----------------
94 ann
134 arun
47 new
41 sruthy
In our real table we have 7 rows and when we apply group by id, the server group the results based on id
在实际的表中,我们有7行,当我们按id应用组时,服务器根据id对结果进行分组
In simple words
用简单的单词
here group by normally reduces the number of rows returned by rolling them up and calculating Sum for each row.
在这里,group by通常通过将它们卷起来并计算每一行的总和来减少返回的行数。
partition by
分区的
before going to partition by
在去分区之前
let us look at OVER clause
让我们看一下上面的条款
As per MSDN definition
根据MSDN的定义
OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.
OVER子句在查询结果集中定义一个窗口或用户指定的一组行,然后窗口函数为窗口中的每一行计算一个值。可以使用OVER子句使用函数来计算聚合值,例如移动平均、累积聚合、运行总数,或每组结果的前N。
partition by will not reduce the number of rows returned
分区by不会减少返回的行数
we can apply partition by in our example table
我们可以在示例表中应用partition by
select SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname from TableA
result :
结果:
marksum firstname
-------------------
134 arun
134 arun
134 arun
94 ann
94 ann
41 sruthy
47 new
look at the results it will partition the rows and results all rows not like group by.
看看它将对行进行划分的结果,并得出与group by不同的所有行。
#3
42
partition by
doesn't actually roll up the data. It allows you to reset something on a per group basis. For example, you can get an ordinal column within a group by partitioning on the grouping field and using rownum()
over the rows within that group. This gives you something that behaves a bit like an identity column that resets at the beginning of each group.
分区by实际上并没有卷起数据。它允许你在每个组的基础上重置一些东西。例如,您可以在分组字段上进行分区,并在组内的行上使用rownum(),从而获得组中的序号列。这给了您一些行为有点像标识列的东西,标识列在每个组的开头重置。
#4
33
PARTITION BY Divides the result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.
通过将结果集划分为分区进行分区。窗口函数分别应用于每个分区,每个分区的计算重新启动。
Found at this link: OVER Clause
在这个链接中找到:OVER子句
#5
23
It provides rolled-up data without rolling up
它提供了卷起来的数据,而没有卷起来。
i.e. Suppose I want to return the relative position of sales region
例如,假设我想返回销售区域的相对位置
Using PARTITION BY, I can return the sales amount for a given region and the MAX amount across all sales regions in the same row.
使用PARTITION BY,我可以返回给定区域的销售额和同一行的所有销售区域的最大销售额。
This does mean you will have repeating data, but it may suit the end consumer in the sense that data has been aggregated but no data has been lost - as would be the case with GROUP BY.
这确实意味着您将拥有重复的数据,但它可能适合终端消费者,因为数据已经被聚合,但没有数据丢失——就像GROUP BY的情况一样。
#6
21
PARTITION BY
is analytic, while GROUP BY
is aggregate. In order to use PARTITION BY
, you have to contain it with an OVER clause.
按是解析的,按是聚合的。为了使用PARTITION BY,必须包含一个OVER子句。
#7
18
As of my understanding Partition By is almost identical to Group By, but with the following differences:
根据我的理解,划分By与Group By几乎相同,但有以下区别:
That group by actually groups the result set returning one row per group, which results therefore in SQL Server only allowing in the SELECT list aggregate functions or columns that are part of the group by clause (in which case SQL Server can guarantee that there are unique results for each group).
这组的组每组结果集返回一行,因此导致在SQL Server中只允许选择聚合函数或列列表group by子句的一部分(在这种情况下,SQL服务器可以保证有独特的结果为每个组)。
Consider for example MySQL which allows to have in the SELECT list columns that are not defined in the Group By clause, in which case one row is still being returned per group, however if the column doesn't have unique results then there is no guarantee what will be the output!
考虑例如MySQL可以选择列表中的列不是Group By子句中定义的,在这种情况下,仍被返回的每一行集团,然而如果列没有独特的结果就没有保证会输出什么!
But with Partition By, although the results of the function are identical to the results of an aggregate function with Group By, still you are getting the normal result set, which means that one is getting one row per underlying row, and not one row per group, and because of this one can have columns that are not unique per group in the SELECT list.
但与分区,尽管函数的结果是相同的结果的聚合函数组,你仍然得到正常的结果集,这意味着潜在一正一行一行,而不是每组一行,因为这一列,每组在选择列表中并不是唯一的。
So as a summary, Group By would be best when needs an output of one row per group, and Partition By would be best when one needs all the rows but still wants the aggregate function based on a group.
综上所述,当需要每个组输出一行时,Group By是最好的,而当需要所有行但仍然需要基于Group的聚合函数时,Partition By是最好的。
Of course there might also be performance issues, see http://social.msdn.microsoft.com/Forums/ms-MY/transactsql/thread/0b20c2b5-1607-40bc-b7a7-0c60a2a55fba.
当然也可能存在性能问题,请参阅http://social.msdn.microsoft.com/Forums/ms-MY/transactsql/thread/0b20c2b5-1607-40bc-b7a7-0c60a2a55fba。
#8
0
Suppose we have 14 records of name
column in table
假设表中有14条name列记录
in group by
在集团
select name,count(*) as totalcount from person where name='Please fill out' group BY name;
it will give count in single row i.e 14
它会在单行i中给出count。e 14
but in partition by
但在分区
select row_number() over (partition by name) as total from person where name = 'Please fill out';
it will 14 rows of increase in count
它将增加14行计数
#9
0
Small observation. Automation mechanism to dynamically generate SQL using the 'partition by' it is much simpler to implement in relation to the 'group by'. In the case of 'group by', We must take care of the content of 'select' column.
小的观察。使用“分区by”动态生成SQL的自动化机制,相对于“group by”实现起来要简单得多。对于“group by”,我们必须注意“select”列的内容。
Sorry for My English.
对不起,我的英语。
#10
-1
-- BELOW IS A SAMPLE WHICH OUTLINES THE SIMPLE DIFFERENCES
-- READ IT AND THEN EXECUTE IT
-- THERE ARE THREE ROWS OF EACH COLOR INSERTED INTO THE TABLE
-- CREATE A database called testDB
-- use testDB
USE [TestDB]
GO
-- create Paints table
CREATE TABLE [dbo].[Paints](
[Color] [varchar](50) NULL,
[glossLevel] [varchar](50) NULL
) ON [PRIMARY]
GO
-- Populate Table
insert into paints (color, glossLevel)
select 'red', 'eggshell'
union
select 'red', 'glossy'
union
select 'red', 'flat'
union
select 'blue', 'eggshell'
union
select 'blue', 'glossy'
union
select 'blue', 'flat'
union
select 'orange', 'glossy'
union
select 'orange', 'flat'
union
select 'orange', 'eggshell'
union
select 'green', 'eggshell'
union
select 'green', 'glossy'
union
select 'green', 'flat'
union
select 'black', 'eggshell'
union
select 'black', 'glossy'
union
select 'black', 'flat'
union
select 'purple', 'eggshell'
union
select 'purple', 'glossy'
union
select 'purple', 'flat'
union
select 'salmon', 'eggshell'
union
select 'salmon', 'glossy'
union
select 'salmon', 'flat'
/* COMPARE 'GROUP BY' color to 'OVER (PARTITION BY Color)' */
-- GROUP BY Color
-- row quantity defined by group by
-- aggregate (count(*)) defined by group by
select count(*) from paints
group by color
-- OVER (PARTITION BY... Color
-- row quantity defined by main query
-- aggregate defined by OVER-PARTITION BY
select color
, glossLevel
, count(*) OVER (Partition by color)
from paints
/* COMPARE 'GROUP BY' color, glossLevel to 'OVER (PARTITION BY Color, GlossLevel)' */
-- GROUP BY Color, GlossLevel
-- row quantity defined by GROUP BY
-- aggregate (count(*)) defined by GROUP BY
select count(*) from paints
group by color, glossLevel
-- Partition by Color, GlossLevel
-- row quantity defined by main query
-- aggregate (count(*)) defined by OVER-PARTITION BY
select color
, glossLevel
, count(*) OVER (Partition by color, glossLevel)
from paints