I have a SQL query written by someone else and I'm trying to figure out what it does. Can someone please explain what the Partition By
and Row_Number
keywords does here and give a simple example of it in action, as well as why one would want to use it?
我有一个由别人写的SQL查询,我想知道它是干什么的。有人能解释一下这个分区和Row_Number关键字在这里做了什么,并给出了一个简单的例子,以及为什么要使用它吗?
An example of partition by:
一个分区的例子:
(SELECT cdt.*,
ROW_NUMBER ()
OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency
ORDER BY cdt.country_code, cdt.account, cdt.currency)
seq_no
FROM CUSTOMER_DETAILS cdt);
I've seen some examples online, they are in bit too depth.
我在网上看过一些例子,它们有点太深了。
Thanks in advance!
提前谢谢!
4 个解决方案
#1
89
PARTITION BY
segregate sets, this enables you to be able to work(ROW_NUMBER(),COUNT(),SUM(),etc) on related set independently.
通过隔离集进行分区,这使您能够独立地在相关集上工作(ROW_NUMBER()、COUNT()、SUM()等)。
In your query, the related set comprised of rows with similar cdt.country_code, cdt.account, cdt.currency. When you partition on those columns and you apply ROW_NUMBER on them. Those other columns on those combination/set will receive sequential number from ROW_NUMBER
在您的查询中,相关的集合由具有类似cdt的行组成。country_code,cdt。账户,cdt.currency。当在这些列上进行分区时,在它们上应用ROW_NUMBER。那些组合/集上的其他列将从ROW_NUMBER接收序列号。
But that query is funny, if your partition by some unique data and you put a row_number on it, it will just produce same number. It's like you do an ORDER BY on a partition that is guaranteed to be unique. Example, think of GUID as unique combination of cdt.country_code, cdt.account, cdt.currency
但是这个查询很有趣,如果你的分区有一些独特的数据,你在它上面加上一个row_number,它就会产生相同的数字。这就像你在一个分区上执行命令一样,保证是唯一的。例如,将GUID视为cdt的唯一组合。country_code,cdt。账户,cdt.currency
newid()
produces GUID, so what shall you expect by this expression?
newid()生成GUID,那么您希望通过这个表达式得到什么?
select
hi,ho,
row_number() over(partition by newid() order by hi,ho)
from tbl;
...Right, all the partitioned(none was partitioned, every row is partitioned in their own row) rows' row_numbers are all set to 1
…对,所有分区(没有分区,每一行都在它们自己的行中分区)行' row_number都设置为1。
Basically, you should partition on non-unique columns. ORDER BY on OVER needed the PARTITION BY to have a non-unique combination, otherwise all row_numbers will become 1
基本上,应该在非唯一列上进行分区。通过对需要的分区进行排序,得到一个非唯一的组合,否则所有的row_numbers都将变成1。
An example, this is your data:
一个例子,这是你的数据:
create table tbl(hi varchar, ho varchar);
insert into tbl values
('A','X'),
('A','Y'),
('A','Z'),
('B','W'),
('B','W'),
('C','L'),
('C','L');
Then this is analogous to your query:
这类似于你的查询:
select
hi,ho,
row_number() over(partition by hi,ho order by hi,ho)
from tbl;
What will be the output of that?
它的输出是什么?
HI HO COLUMN_2
A X 1
A Y 1
A Z 1
B W 1
B W 2
C L 1
C L 2
You see thee combination of HI HO? The first three rows has unique combination, hence they are set to 1, the B rows has same W, hence different ROW_NUMBERS, likewise with HI C rows.
你看到你的结合了吗?前三行具有惟一的组合,因此它们被设置为1,B行具有相同的W,因此不同的ROW_NUMBERS,同样与HI C行相同。
Now, why is the ORDER BY
needed there? If the previous developer merely want to put a row_number on similar data (e.g. HI B, all data are B-W, B-W), he can just do this:
为什么需要这个顺序呢?如果之前的开发人员只想在类似的数据上添加一个row_number(例如,HI B,所有的数据都是B- w, B- w),他可以这样做:
select
hi,ho,
row_number() over(partition by hi,ho)
from tbl;
But alas, Oracle(and Sql Server too) doesn't allow partition with no ORDER BY
; whereas in Postgresql, ORDER BY
on PARTITION is optional: http://www.sqlfiddle.com/#!1/27821/1
但是,唉,Oracle(和Sql Server)不允许分区,没有顺序;而在Postgresql中,分区的顺序是可选的:http://www.sqlfiddle.com/#!1/27821/1。
select
hi,ho,
row_number() over(partition by hi,ho)
from tbl;
Your ORDER BY
on your partition look a bit redundant, not because of the previous developer's fault, some database just don't allow PARTITION
with no ORDER BY
, he might not able find a good candidate column to sort on. If both PARTITION BY columns and ORDER BY columns are the same just remove the ORDER BY, but since some database don't allow it, you can just do this:
您的分区上的订单看起来有点多余,不是因为之前的开发人员的错误,有些数据库只是不允许分区,因为他可能找不到一个好的候选列来排序。如果按列的划分和列的顺序都是相同的,只是删除了ORDER BY,但是由于一些数据库不允许这样做,您可以这样做:
SELECT cdt.*,
ROW_NUMBER ()
OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency
ORDER BY newid())
seq_no
FROM CUSTOMER_DETAILS cdt
You cannot find a good column to use for sorting similar data? You might as well sort on random, the partitioned data have the same values anyway. You can use GUID for example(you use newid()
for SQL Server). So that has the same output made by previous developer, it's unfortunate that some database doesn't allow PARTITION
with no ORDER BY
您不能找到一个好的列来用于排序相似的数据?您可以随意排序,分区数据无论如何都具有相同的值。例如,您可以使用GUID(您使用newid()用于SQL Server)。这与之前的开发人员所做的输出是相同的,很不幸的是,有些数据库不允许分区。
Though really, it eludes me and I cannot find a good reason to put a number on the same combinations (B-W, B-W in example above). It's giving the impression of database having redundant data. Somehow reminded me of this: How to get one unique record from the same list of records from table? No Unique constraint in the table
尽管如此,它还是让我困惑,我找不到一个很好的理由将一个数字放在相同的组合上(B-W, B-W)。它给人一种数据库冗余数据的印象。不知何故让我想起了这一点:如何从一张表中获得一张唯一的记录?表中没有唯一的约束。
It really looks arcane seeing a PARTITION BY with same combination of columns with ORDER BY, can not easily infer the code's intent.
通过与ORDER BY相同的列组合来查看分区,看起来确实很神秘,不能很容易地推断出代码的意图。
Live test: http://www.sqlfiddle.com/#!3/27821/6
现场测试:http://www.sqlfiddle.com/ ! 3/27821/6
But as dbaseman have noticed also, it's useless to partition and order on same columns.
但是,正如dbaseman注意到的,在相同的列上划分和顺序是无用的。
You have a set of data like this:
你有一组这样的数据:
create table tbl(hi varchar, ho varchar);
insert into tbl values
('A','X'),
('A','X'),
('A','X'),
('B','Y'),
('B','Y'),
('C','Z'),
('C','Z');
Then you PARTITION BY hi,ho; and then you ORDER BY hi,ho. There's no sense numbering similar data :-) http://www.sqlfiddle.com/#!3/29ab8/3
然后你用hi,ho;然后你点了hi,ho。编号相似的数据没有意义:-)http://www.sqlfiddle.com/# 3/29ab8/3。
select
hi,ho,
row_number() over(partition by hi,ho order by hi,ho) as nr
from tbl;
Output:
输出:
HI HO ROW_QUERY_A
A X 1
A X 2
A X 3
B Y 1
B Y 2
C Z 1
C Z 2
See? Why need to put row numbers on same combination? What you will analyze on triple A,X, on double B,Y, on double C,Z? :-)
看到了吗?为什么要把行号放在相同的组合上呢?你要分析的是A,X,双B,Y,双C,Z?:-)
You just need to use PARTITION on non-unique column, then you sort on non-unique column(s)'s unique-ing column. Example will make it more clear:
您只需要在非唯一列上使用分区,然后对非唯一列(s)的惟一列进行排序。示例将使其更加明确:
create table tbl(hi varchar, ho varchar);
insert into tbl values
('A','D'),
('A','E'),
('A','F'),
('B','F'),
('B','E'),
('C','E'),
('C','D');
select
hi,ho,
row_number() over(partition by hi order by ho) as nr
from tbl;
PARTITION BY hi
operates on non unique column, then on each partitioned column, you order on its unique column(ho), ORDER BY ho
在非唯一列上进行分区,然后在每个分区列上,按ho的唯一列(ho)顺序排列。
Output:
输出:
HI HO NR
A D 1
A E 2
A F 3
B E 1
B F 2
C D 1
C E 2
That data set makes more sense
这个数据集更有意义。
Live test: http://www.sqlfiddle.com/#!3/d0b44/1
现场测试:http://www.sqlfiddle.com/ ! 3 / d0b44/1
And this is similar to your query with same columns on both PARTITION BY and ORDER BY:
这类似于您在两个分区上使用相同列的查询:
select
hi,ho,
row_number() over(partition by hi,ho order by hi,ho) as nr
from tbl;
And this is the ouput:
这是ouput:
HI HO NR
A D 1
A E 1
A F 1
B E 1
B F 1
C D 1
C E 1
See? no sense?
看到了吗?没有意义?
Live test: http://www.sqlfiddle.com/#!3/d0b44/3
现场测试:http://www.sqlfiddle.com/ ! 3 / d0b44/3
Finally this might be the right query:
最后,这可能是正确的查询:
SELECT cdt.*,
ROW_NUMBER ()
OVER (PARTITION BY cdt.country_code, cdt.account -- removed: cdt.currency
ORDER BY
-- removed: cdt.country_code, cdt.account,
cdt.currency) -- keep
seq_no
FROM CUSTOMER_DETAILS cdt
#2
7
That selects the row number per country code, account, and currency. So, the rows with country code "US", account "XYZ" and currency "$USD" will each get a row number assigned from 1-n; the same goes for every other combination of those columns in the result set.
它选择每个国家代码、帐户和货币的行号。因此,带有国家代码“US”的行,account“XYZ”和“$USD”将会得到从1-n分配的行号;结果集中这些列的其他组合也一样。
This query is kind of funny, because the order by clause does absolutely nothing. All the rows in each partition have the same country code, account, and currency, so there's no point ordering by those columns. The ultimate row numbers assigned in this particular query will therefore be unpredictable.
这个查询很有趣,因为order by子句什么都不做。每个分区中的所有行都有相同的国家代码、帐户和货币,因此这些列没有点排序。因此,在这个特定查询中分配的最终行号是不可预测的。
Hope that helps...
希望这有助于……
#3
4
I often use row_number() as a quick way to discard duplicate records from my select statements. Just add a where clause. Something like...
我经常使用row_number()作为一种快速方法,从我的select语句中丢弃重复的记录。只要添加where子句。之类的……
select a,b,rn
from (select a, b, row_number() over (partition by a,b order by a,b) as rn
from table)
where rn=1;
#4
2
I know this is an old thread but PARTITION is the equiv of GROUP BY not ORDER BY. ORDER BY in this function is . . . ORDER BY. It's just a way to create uniqueness out of redundancy by adding a sequence number. Or you may eliminate the other redundant records by the WHERE clause when referencing the aliased column for the function. However, DISTINCT in the SELECT statement would probably accomplish the same thing in that regard.
我知道这是一个旧的线程,但是分区是GROUP的equiv,而不是ORDER BY。这个函数的顺序是…ORDER BY。它只是通过添加一个序号来创建冗余的唯一性。或者,当引用函数的别名列时,可以通过WHERE子句消除其他冗余记录。但是,在SELECT语句中不同的地方可能会在这方面完成相同的任务。
#1
89
PARTITION BY
segregate sets, this enables you to be able to work(ROW_NUMBER(),COUNT(),SUM(),etc) on related set independently.
通过隔离集进行分区,这使您能够独立地在相关集上工作(ROW_NUMBER()、COUNT()、SUM()等)。
In your query, the related set comprised of rows with similar cdt.country_code, cdt.account, cdt.currency. When you partition on those columns and you apply ROW_NUMBER on them. Those other columns on those combination/set will receive sequential number from ROW_NUMBER
在您的查询中,相关的集合由具有类似cdt的行组成。country_code,cdt。账户,cdt.currency。当在这些列上进行分区时,在它们上应用ROW_NUMBER。那些组合/集上的其他列将从ROW_NUMBER接收序列号。
But that query is funny, if your partition by some unique data and you put a row_number on it, it will just produce same number. It's like you do an ORDER BY on a partition that is guaranteed to be unique. Example, think of GUID as unique combination of cdt.country_code, cdt.account, cdt.currency
但是这个查询很有趣,如果你的分区有一些独特的数据,你在它上面加上一个row_number,它就会产生相同的数字。这就像你在一个分区上执行命令一样,保证是唯一的。例如,将GUID视为cdt的唯一组合。country_code,cdt。账户,cdt.currency
newid()
produces GUID, so what shall you expect by this expression?
newid()生成GUID,那么您希望通过这个表达式得到什么?
select
hi,ho,
row_number() over(partition by newid() order by hi,ho)
from tbl;
...Right, all the partitioned(none was partitioned, every row is partitioned in their own row) rows' row_numbers are all set to 1
…对,所有分区(没有分区,每一行都在它们自己的行中分区)行' row_number都设置为1。
Basically, you should partition on non-unique columns. ORDER BY on OVER needed the PARTITION BY to have a non-unique combination, otherwise all row_numbers will become 1
基本上,应该在非唯一列上进行分区。通过对需要的分区进行排序,得到一个非唯一的组合,否则所有的row_numbers都将变成1。
An example, this is your data:
一个例子,这是你的数据:
create table tbl(hi varchar, ho varchar);
insert into tbl values
('A','X'),
('A','Y'),
('A','Z'),
('B','W'),
('B','W'),
('C','L'),
('C','L');
Then this is analogous to your query:
这类似于你的查询:
select
hi,ho,
row_number() over(partition by hi,ho order by hi,ho)
from tbl;
What will be the output of that?
它的输出是什么?
HI HO COLUMN_2
A X 1
A Y 1
A Z 1
B W 1
B W 2
C L 1
C L 2
You see thee combination of HI HO? The first three rows has unique combination, hence they are set to 1, the B rows has same W, hence different ROW_NUMBERS, likewise with HI C rows.
你看到你的结合了吗?前三行具有惟一的组合,因此它们被设置为1,B行具有相同的W,因此不同的ROW_NUMBERS,同样与HI C行相同。
Now, why is the ORDER BY
needed there? If the previous developer merely want to put a row_number on similar data (e.g. HI B, all data are B-W, B-W), he can just do this:
为什么需要这个顺序呢?如果之前的开发人员只想在类似的数据上添加一个row_number(例如,HI B,所有的数据都是B- w, B- w),他可以这样做:
select
hi,ho,
row_number() over(partition by hi,ho)
from tbl;
But alas, Oracle(and Sql Server too) doesn't allow partition with no ORDER BY
; whereas in Postgresql, ORDER BY
on PARTITION is optional: http://www.sqlfiddle.com/#!1/27821/1
但是,唉,Oracle(和Sql Server)不允许分区,没有顺序;而在Postgresql中,分区的顺序是可选的:http://www.sqlfiddle.com/#!1/27821/1。
select
hi,ho,
row_number() over(partition by hi,ho)
from tbl;
Your ORDER BY
on your partition look a bit redundant, not because of the previous developer's fault, some database just don't allow PARTITION
with no ORDER BY
, he might not able find a good candidate column to sort on. If both PARTITION BY columns and ORDER BY columns are the same just remove the ORDER BY, but since some database don't allow it, you can just do this:
您的分区上的订单看起来有点多余,不是因为之前的开发人员的错误,有些数据库只是不允许分区,因为他可能找不到一个好的候选列来排序。如果按列的划分和列的顺序都是相同的,只是删除了ORDER BY,但是由于一些数据库不允许这样做,您可以这样做:
SELECT cdt.*,
ROW_NUMBER ()
OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency
ORDER BY newid())
seq_no
FROM CUSTOMER_DETAILS cdt
You cannot find a good column to use for sorting similar data? You might as well sort on random, the partitioned data have the same values anyway. You can use GUID for example(you use newid()
for SQL Server). So that has the same output made by previous developer, it's unfortunate that some database doesn't allow PARTITION
with no ORDER BY
您不能找到一个好的列来用于排序相似的数据?您可以随意排序,分区数据无论如何都具有相同的值。例如,您可以使用GUID(您使用newid()用于SQL Server)。这与之前的开发人员所做的输出是相同的,很不幸的是,有些数据库不允许分区。
Though really, it eludes me and I cannot find a good reason to put a number on the same combinations (B-W, B-W in example above). It's giving the impression of database having redundant data. Somehow reminded me of this: How to get one unique record from the same list of records from table? No Unique constraint in the table
尽管如此,它还是让我困惑,我找不到一个很好的理由将一个数字放在相同的组合上(B-W, B-W)。它给人一种数据库冗余数据的印象。不知何故让我想起了这一点:如何从一张表中获得一张唯一的记录?表中没有唯一的约束。
It really looks arcane seeing a PARTITION BY with same combination of columns with ORDER BY, can not easily infer the code's intent.
通过与ORDER BY相同的列组合来查看分区,看起来确实很神秘,不能很容易地推断出代码的意图。
Live test: http://www.sqlfiddle.com/#!3/27821/6
现场测试:http://www.sqlfiddle.com/ ! 3/27821/6
But as dbaseman have noticed also, it's useless to partition and order on same columns.
但是,正如dbaseman注意到的,在相同的列上划分和顺序是无用的。
You have a set of data like this:
你有一组这样的数据:
create table tbl(hi varchar, ho varchar);
insert into tbl values
('A','X'),
('A','X'),
('A','X'),
('B','Y'),
('B','Y'),
('C','Z'),
('C','Z');
Then you PARTITION BY hi,ho; and then you ORDER BY hi,ho. There's no sense numbering similar data :-) http://www.sqlfiddle.com/#!3/29ab8/3
然后你用hi,ho;然后你点了hi,ho。编号相似的数据没有意义:-)http://www.sqlfiddle.com/# 3/29ab8/3。
select
hi,ho,
row_number() over(partition by hi,ho order by hi,ho) as nr
from tbl;
Output:
输出:
HI HO ROW_QUERY_A
A X 1
A X 2
A X 3
B Y 1
B Y 2
C Z 1
C Z 2
See? Why need to put row numbers on same combination? What you will analyze on triple A,X, on double B,Y, on double C,Z? :-)
看到了吗?为什么要把行号放在相同的组合上呢?你要分析的是A,X,双B,Y,双C,Z?:-)
You just need to use PARTITION on non-unique column, then you sort on non-unique column(s)'s unique-ing column. Example will make it more clear:
您只需要在非唯一列上使用分区,然后对非唯一列(s)的惟一列进行排序。示例将使其更加明确:
create table tbl(hi varchar, ho varchar);
insert into tbl values
('A','D'),
('A','E'),
('A','F'),
('B','F'),
('B','E'),
('C','E'),
('C','D');
select
hi,ho,
row_number() over(partition by hi order by ho) as nr
from tbl;
PARTITION BY hi
operates on non unique column, then on each partitioned column, you order on its unique column(ho), ORDER BY ho
在非唯一列上进行分区,然后在每个分区列上,按ho的唯一列(ho)顺序排列。
Output:
输出:
HI HO NR
A D 1
A E 2
A F 3
B E 1
B F 2
C D 1
C E 2
That data set makes more sense
这个数据集更有意义。
Live test: http://www.sqlfiddle.com/#!3/d0b44/1
现场测试:http://www.sqlfiddle.com/ ! 3 / d0b44/1
And this is similar to your query with same columns on both PARTITION BY and ORDER BY:
这类似于您在两个分区上使用相同列的查询:
select
hi,ho,
row_number() over(partition by hi,ho order by hi,ho) as nr
from tbl;
And this is the ouput:
这是ouput:
HI HO NR
A D 1
A E 1
A F 1
B E 1
B F 1
C D 1
C E 1
See? no sense?
看到了吗?没有意义?
Live test: http://www.sqlfiddle.com/#!3/d0b44/3
现场测试:http://www.sqlfiddle.com/ ! 3 / d0b44/3
Finally this might be the right query:
最后,这可能是正确的查询:
SELECT cdt.*,
ROW_NUMBER ()
OVER (PARTITION BY cdt.country_code, cdt.account -- removed: cdt.currency
ORDER BY
-- removed: cdt.country_code, cdt.account,
cdt.currency) -- keep
seq_no
FROM CUSTOMER_DETAILS cdt
#2
7
That selects the row number per country code, account, and currency. So, the rows with country code "US", account "XYZ" and currency "$USD" will each get a row number assigned from 1-n; the same goes for every other combination of those columns in the result set.
它选择每个国家代码、帐户和货币的行号。因此,带有国家代码“US”的行,account“XYZ”和“$USD”将会得到从1-n分配的行号;结果集中这些列的其他组合也一样。
This query is kind of funny, because the order by clause does absolutely nothing. All the rows in each partition have the same country code, account, and currency, so there's no point ordering by those columns. The ultimate row numbers assigned in this particular query will therefore be unpredictable.
这个查询很有趣,因为order by子句什么都不做。每个分区中的所有行都有相同的国家代码、帐户和货币,因此这些列没有点排序。因此,在这个特定查询中分配的最终行号是不可预测的。
Hope that helps...
希望这有助于……
#3
4
I often use row_number() as a quick way to discard duplicate records from my select statements. Just add a where clause. Something like...
我经常使用row_number()作为一种快速方法,从我的select语句中丢弃重复的记录。只要添加where子句。之类的……
select a,b,rn
from (select a, b, row_number() over (partition by a,b order by a,b) as rn
from table)
where rn=1;
#4
2
I know this is an old thread but PARTITION is the equiv of GROUP BY not ORDER BY. ORDER BY in this function is . . . ORDER BY. It's just a way to create uniqueness out of redundancy by adding a sequence number. Or you may eliminate the other redundant records by the WHERE clause when referencing the aliased column for the function. However, DISTINCT in the SELECT statement would probably accomplish the same thing in that regard.
我知道这是一个旧的线程,但是分区是GROUP的equiv,而不是ORDER BY。这个函数的顺序是…ORDER BY。它只是通过添加一个序号来创建冗余的唯一性。或者,当引用函数的别名列时,可以通过WHERE子句消除其他冗余记录。但是,在SELECT语句中不同的地方可能会在这方面完成相同的任务。