原文链接:http://www.tbk.ren/article/255.html?source=csdn
有的同学在处理自己的业务逻辑的时候,需要用到Row Number OVER Partition的方法,那么,什么是Row Number OVER Partition呢?
例如我们有下面的数据,第一列是用户的ID,第二列是用户的购买日期,现在如果我们需要判断用户是否重复购买,并且,每一次的购买,下一次的购买时间间隔是多少呢?
id,date
1,20150601
1,20150603
2,20150601
2,20150605
2,20150610
3,20150503
3,20150603
4,20150601
如果我们可以处理称为下面的数据,那么这个问题就完美解决了:
id date.x date.y
0 1 20150601 20150603
1 1 20150603 NaN
2 2 20150601 20150605
3 2 20150605 20150610
4 2 20150610 NaN
5 3 20150503 20150603
6 3 20150603 NaN
7 4 20150601 NaN
date_y就是下一次的购买时间,date_y-date_x就是前后两次的购买间隔。
把数据处理成这种格式,使用Row Number OVER Partition或者是说Rank OVER Partition是最简单的方法。
我们先来生成数据:
CREATE TABLE t (
ID INT,
DATE INT
) ENGINE=MYISAM DEFAULT CHARSET=utf8;
INSERT INTO t VALUES
(1,20150601),
(1,20150603),
(2,20150601),
(2,20150605),
(2,20150610),
(3,20150503),
(3,20150603),
(4,20150601);
接着,我们来在MySQL中,实现Row Number OVER Partition或者是说Rank OVER Partition的方法:
SET @id_rank:=0;
SELECT id, DATE, id_rank FROM (
SELECT id, DATE,
@id_rank := IF(@id = id, @id_rank + 1, 1) AS id_rank,
@id := id
FROM t
ORDER BY id, DATE
) a;
该查询,返回的结果如下:
"id" "DATE""id_rank"
"1" "20150601""1"
"1" "20150603""2"
"2" "20150601""1"
"2" "20150605""2"
"2" "20150610""3"
"3" "20150503""1"
"3" "20150603""2"
"4" "20150601""1"
使用这个结果,我们就可以使用Left Join的逻辑,计算出我们的目标表格:
SET @id_rank:=0;
CREATE TABLE temp ENGINE=MYISAM DEFAULT CHARSET=utf8 AS
SELECT id, DATE, id_rank FROM (
SELECT id, DATE,
@id_rank := IF(@id = id, @id_rank + 1, 1) AS id_rank,
@id := id
FROM t
ORDER BY id, DATE
) a;
SELECT a.id, a.date, b.date FROM (
SELECT id, DATE, id_rank FROM temp
) a LEFT JOIN (
SELECT id, DATE, (id_rank-1) AS id_rank_1 FROM temp
) b ON a.id=b.id AND a.id_rank=b.id_rank_1;
得到的结果如下所示:
"id" "date""date"
"1" "20150601""20150603"
"1" "20150603"\N
"2" "20150601""20150605"
"2" "20150605""20150610"
"2" "20150610"\N
"3" "20150503""20150603"
"3" "20150603"\N
"4" "20150601"\N
如果是大量的数据,我们记得一定要在CREATE TABLE temp的时候,创建索引,这样子执行起来,才会杠杠滴。
最后,千万不要忘记,删掉temp表
DROP TABLE temp;