I have the following table:
我有以下表格:
ITEM DATE VALUE
----------------------
ITEM1 2016-05-04 1
ITEM1 2016-05-05 3
ITEM1 2016-05-06 3
ITEM1 2016-05-09 3
ITEM1 2016-05-04 4
ITEM2 2016-05-10 1
ITEM2 2016-05-05 2
ITEM2 2016-05-06 3
ITEM2 2016-05-09 1
ITEM2 2016-05-10 1
And I want to get out, per item, how many entries back in time the value column has been the same (flat):
我想要得到的是,每一项,在时间上有多少项值列是相同的(单位)
ITEM DATE VALUE NUM_FLAT_ENTRYPOINTS
------------------------------
ITEM1 2016-05-04 1 0
ITEM1 2016-05-05 3 0
ITEM1 2016-05-06 3 1
ITEM1 2016-05-09 3 2
ITEM1 2016-05-10 4 0
ITEM2 2016-05-04 1 0
ITEM2 2016-05-05 2 0
ITEM2 2016-05-06 3 0
ITEM2 2016-05-09 1 0
ITEM2 2016-05-10 1 1
My initial though would be:
我最初的想法是:
select
*,
rank()-1 over (partition by ITEM,VALUE order by DATE) as NUM_FLAT_ENTRYPOINTS
from my_table
This, however, does not work as ITEM2 would partition 2016-05-04, 2016-05-09 and 2016-05-10 together and show 2 instead of 1 for NUM_FLAT_ENTRYPOINTS for the last line.
然而,这并不适用于ITEM2将2016-05-04、2016-05-09和2016-05-10一起分割,并在最后一行中显示NUM_FLAT_ENTRYPOINTS的2而不是1。
I am using Microsoft SQL Server 2008.
我正在使用Microsoft SQL Server 2008。
Any ideas?
什么好主意吗?
Edit:
编辑:
In Oracle (and possible other SQL Servers) it seems I can just do
在Oracle(可能还有其他SQL服务器)中,我似乎可以这么做
select
count(VALUE)-1 over (partition by ITEM,VALUE order by DATE) as NUM_FLAT_ENTRYPOINTS
from my_table
but as far as I can tell this syntax does not work in SQL Server 2008. Any way to work around it?
但是,就我所知,这种语法在SQL Server 2008中不起作用。有办法解决这个问题吗?
3 个解决方案
#1
1
It looks like a variation of gaps-and-islands.
它看起来就像一个裂缝和岛屿的变种。
Sample data
样本数据
DECLARE @T TABLE (ITEM varchar(50), dt date, VALUE int);
INSERT INTO @T(ITEM, dt, VALUE) VALUES
('ITEM1', '2016-05-04', 1),
('ITEM1', '2016-05-05', 3),
('ITEM1', '2016-05-06', 3),
('ITEM1', '2016-05-09', 3),
('ITEM1', '2016-05-10', 4),
('ITEM2', '2016-05-04', 1),
('ITEM2', '2016-05-05', 2),
('ITEM2', '2016-05-06', 3),
('ITEM2', '2016-05-09', 1),
('ITEM2', '2016-05-10', 1);
Query
查询
WITH
CTE
AS
(
SELECT
ITEM
,dt
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY dt) AS rn1
,ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY dt) AS rn2
FROM @T
)
SELECT
ITEM
,dt
,VALUE
,rn1-rn2 AS rnDiff
,ROW_NUMBER() OVER
(PARTITION BY ITEM, VALUE, rn1-rn2 ORDER BY dt) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM CTE
ORDER BY ITEM, dt;
Result
结果
+-------+------------+-------+--------+----------------------+
| ITEM | dt | VALUE | rnDiff | NUM_FLAT_ENTRYPOINTS |
+-------+------------+-------+--------+----------------------+
| ITEM1 | 2016-05-04 | 1 | 0 | 0 |
| ITEM1 | 2016-05-05 | 3 | 1 | 0 |
| ITEM1 | 2016-05-06 | 3 | 1 | 1 |
| ITEM1 | 2016-05-09 | 3 | 1 | 2 |
| ITEM1 | 2016-05-10 | 4 | 4 | 0 |
| ITEM2 | 2016-05-04 | 1 | 0 | 0 |
| ITEM2 | 2016-05-05 | 2 | 1 | 0 |
| ITEM2 | 2016-05-06 | 3 | 2 | 0 |
| ITEM2 | 2016-05-09 | 1 | 2 | 0 |
| ITEM2 | 2016-05-10 | 1 | 2 | 1 |
+-------+------------+-------+--------+----------------------+
#2
1
Assuming the correction to the sample data I suggested in the comments, this seems to fit the bill:
假设对我在评论中建议的样本数据进行了修正,这似乎符合要求:
declare @t table (ITEM char(5), Date date, Value tinyint)
insert into @t(ITEM,DATE,VALUE) values
('ITEM1','20160504',1),
('ITEM1','20160505',3),
('ITEM1','20160506',3),
('ITEM1','20160509',3),
('ITEM1','20160510',4),
('ITEM2','20160504',1),
('ITEM2','20160505',2),
('ITEM2','20160506',3),
('ITEM2','20160509',1),
('ITEM2','20160510',1)
;With Ordered as (
select
Item,
Date,
Value,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Date) as rn
from @t
)
select
*,
COALESCE(rn -
(select MAX(o2.rn) from Ordered o2
where o2.ITEM = o.ITEM and
o2.rn < o.rn and
o2.Value != o.Value) - 1
, o.rn - 1) as NUM_FLAT_ENTRYPOINTS
from
Ordered o
That is, we assign row numbers (separately for each item), and then we simply find the latest row number earlier than the current one where Value
is different. Subtracting these row numbers (and a further 1) produces the answer we need - assuming such an earlier row can be found. If there's no such earlier row then we're obviously in a sequence that's at the start for a particular item - so we just subtract 1 from the row number.
也就是说,我们为每一项分配行号(分别对应每一项),然后我们只需要在当前行号的前面找到最新的行号,而当前行号的值是不同的。减去这些行号(再减去1)就得到了我们需要的答案——假设可以找到这么早的行。如果没有这么早的行,那么我们显然是在一个特定项的开始的序列中——所以我们从行号中减去1。
I've gone for "obviously correct" here - it's possible that there's a way to produce the result that may perform better but I'm not aiming for that right now.
我在这里选择了“明显正确”——有可能有一种方法可以产生更好的结果,但我现在并没有这个目标。
Results:
结果:
Item Date Value rn NUM_FLAT_ENTRYPOINTS
----- ---------- ----- -------------------- --------------------
ITEM1 2016-05-04 1 1 0
ITEM1 2016-05-05 3 2 0
ITEM1 2016-05-06 3 3 1
ITEM1 2016-05-09 3 4 2
ITEM1 2016-05-10 4 5 0
ITEM2 2016-05-04 1 1 0
ITEM2 2016-05-05 2 2 0
ITEM2 2016-05-06 3 3 0
ITEM2 2016-05-09 1 4 0
ITEM2 2016-05-10 1 5 1
#3
1
Try this:
试试这个:
SELECT ITEM, [DATE], VALUE,
ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE, grp
ORDER BY [DATE]) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM (
SELECT ITEM, [DATE], VALUE,
ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY [DATE]) -
ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY [DATE]) AS grp
FROM mytable) AS t
#1
1
It looks like a variation of gaps-and-islands.
它看起来就像一个裂缝和岛屿的变种。
Sample data
样本数据
DECLARE @T TABLE (ITEM varchar(50), dt date, VALUE int);
INSERT INTO @T(ITEM, dt, VALUE) VALUES
('ITEM1', '2016-05-04', 1),
('ITEM1', '2016-05-05', 3),
('ITEM1', '2016-05-06', 3),
('ITEM1', '2016-05-09', 3),
('ITEM1', '2016-05-10', 4),
('ITEM2', '2016-05-04', 1),
('ITEM2', '2016-05-05', 2),
('ITEM2', '2016-05-06', 3),
('ITEM2', '2016-05-09', 1),
('ITEM2', '2016-05-10', 1);
Query
查询
WITH
CTE
AS
(
SELECT
ITEM
,dt
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY dt) AS rn1
,ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY dt) AS rn2
FROM @T
)
SELECT
ITEM
,dt
,VALUE
,rn1-rn2 AS rnDiff
,ROW_NUMBER() OVER
(PARTITION BY ITEM, VALUE, rn1-rn2 ORDER BY dt) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM CTE
ORDER BY ITEM, dt;
Result
结果
+-------+------------+-------+--------+----------------------+
| ITEM | dt | VALUE | rnDiff | NUM_FLAT_ENTRYPOINTS |
+-------+------------+-------+--------+----------------------+
| ITEM1 | 2016-05-04 | 1 | 0 | 0 |
| ITEM1 | 2016-05-05 | 3 | 1 | 0 |
| ITEM1 | 2016-05-06 | 3 | 1 | 1 |
| ITEM1 | 2016-05-09 | 3 | 1 | 2 |
| ITEM1 | 2016-05-10 | 4 | 4 | 0 |
| ITEM2 | 2016-05-04 | 1 | 0 | 0 |
| ITEM2 | 2016-05-05 | 2 | 1 | 0 |
| ITEM2 | 2016-05-06 | 3 | 2 | 0 |
| ITEM2 | 2016-05-09 | 1 | 2 | 0 |
| ITEM2 | 2016-05-10 | 1 | 2 | 1 |
+-------+------------+-------+--------+----------------------+
#2
1
Assuming the correction to the sample data I suggested in the comments, this seems to fit the bill:
假设对我在评论中建议的样本数据进行了修正,这似乎符合要求:
declare @t table (ITEM char(5), Date date, Value tinyint)
insert into @t(ITEM,DATE,VALUE) values
('ITEM1','20160504',1),
('ITEM1','20160505',3),
('ITEM1','20160506',3),
('ITEM1','20160509',3),
('ITEM1','20160510',4),
('ITEM2','20160504',1),
('ITEM2','20160505',2),
('ITEM2','20160506',3),
('ITEM2','20160509',1),
('ITEM2','20160510',1)
;With Ordered as (
select
Item,
Date,
Value,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Date) as rn
from @t
)
select
*,
COALESCE(rn -
(select MAX(o2.rn) from Ordered o2
where o2.ITEM = o.ITEM and
o2.rn < o.rn and
o2.Value != o.Value) - 1
, o.rn - 1) as NUM_FLAT_ENTRYPOINTS
from
Ordered o
That is, we assign row numbers (separately for each item), and then we simply find the latest row number earlier than the current one where Value
is different. Subtracting these row numbers (and a further 1) produces the answer we need - assuming such an earlier row can be found. If there's no such earlier row then we're obviously in a sequence that's at the start for a particular item - so we just subtract 1 from the row number.
也就是说,我们为每一项分配行号(分别对应每一项),然后我们只需要在当前行号的前面找到最新的行号,而当前行号的值是不同的。减去这些行号(再减去1)就得到了我们需要的答案——假设可以找到这么早的行。如果没有这么早的行,那么我们显然是在一个特定项的开始的序列中——所以我们从行号中减去1。
I've gone for "obviously correct" here - it's possible that there's a way to produce the result that may perform better but I'm not aiming for that right now.
我在这里选择了“明显正确”——有可能有一种方法可以产生更好的结果,但我现在并没有这个目标。
Results:
结果:
Item Date Value rn NUM_FLAT_ENTRYPOINTS
----- ---------- ----- -------------------- --------------------
ITEM1 2016-05-04 1 1 0
ITEM1 2016-05-05 3 2 0
ITEM1 2016-05-06 3 3 1
ITEM1 2016-05-09 3 4 2
ITEM1 2016-05-10 4 5 0
ITEM2 2016-05-04 1 1 0
ITEM2 2016-05-05 2 2 0
ITEM2 2016-05-06 3 3 0
ITEM2 2016-05-09 1 4 0
ITEM2 2016-05-10 1 5 1
#3
1
Try this:
试试这个:
SELECT ITEM, [DATE], VALUE,
ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE, grp
ORDER BY [DATE]) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM (
SELECT ITEM, [DATE], VALUE,
ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY [DATE]) -
ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY [DATE]) AS grp
FROM mytable) AS t