为具有重复值的行分配序号

时间:2021-11-13 22:59:02

I have the following table:

我有以下表格:

ITEM    DATE        VALUE
----------------------
ITEM1   2016-05-04  1
ITEM1   2016-05-05  3
ITEM1   2016-05-06  3
ITEM1   2016-05-09  3
ITEM1   2016-05-04  4
ITEM2   2016-05-10  1
ITEM2   2016-05-05  2
ITEM2   2016-05-06  3
ITEM2   2016-05-09  1
ITEM2   2016-05-10  1

And I want to get out, per item, how many entries back in time the value column has been the same (flat):

我想要得到的是,每一项,在时间上有多少项值列是相同的(单位)

ITEM    DATE    VALUE   NUM_FLAT_ENTRYPOINTS
------------------------------
ITEM1   2016-05-04  1   0
ITEM1   2016-05-05  3   0 
ITEM1   2016-05-06  3   1
ITEM1   2016-05-09  3   2
ITEM1   2016-05-10  4   0
ITEM2   2016-05-04  1   0
ITEM2   2016-05-05  2   0
ITEM2   2016-05-06  3   0
ITEM2   2016-05-09  1   0
ITEM2   2016-05-10  1   1

My initial though would be:

我最初的想法是:

select 
    *,
    rank()-1 over (partition by ITEM,VALUE order by DATE) as NUM_FLAT_ENTRYPOINTS 
from my_table

This, however, does not work as ITEM2 would partition 2016-05-04, 2016-05-09 and 2016-05-10 together and show 2 instead of 1 for NUM_FLAT_ENTRYPOINTS for the last line.

然而,这并不适用于ITEM2将2016-05-04、2016-05-09和2016-05-10一起分割,并在最后一行中显示NUM_FLAT_ENTRYPOINTS的2而不是1。

I am using Microsoft SQL Server 2008.

我正在使用Microsoft SQL Server 2008。

Any ideas?

什么好主意吗?

Edit:

编辑:

In Oracle (and possible other SQL Servers) it seems I can just do

在Oracle(可能还有其他SQL服务器)中,我似乎可以这么做

select
    count(VALUE)-1 over (partition by ITEM,VALUE order by DATE) as NUM_FLAT_ENTRYPOINTS 
from my_table

but as far as I can tell this syntax does not work in SQL Server 2008. Any way to work around it?

但是,就我所知,这种语法在SQL Server 2008中不起作用。有办法解决这个问题吗?

3 个解决方案

#1


1  

It looks like a variation of gaps-and-islands.

它看起来就像一个裂缝和岛屿的变种。

Sample data

样本数据

DECLARE @T TABLE (ITEM varchar(50), dt date, VALUE int);
INSERT INTO @T(ITEM, dt, VALUE) VALUES
('ITEM1', '2016-05-04', 1),
('ITEM1', '2016-05-05', 3),
('ITEM1', '2016-05-06', 3),
('ITEM1', '2016-05-09', 3),
('ITEM1', '2016-05-10', 4),
('ITEM2', '2016-05-04', 1),
('ITEM2', '2016-05-05', 2),
('ITEM2', '2016-05-06', 3),
('ITEM2', '2016-05-09', 1),
('ITEM2', '2016-05-10', 1);

Query

查询

WITH
CTE
AS
(
    SELECT
        ITEM
        ,dt
        ,VALUE
        ,ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY dt) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY dt) AS rn2
    FROM @T
)
SELECT
    ITEM
    ,dt
    ,VALUE
    ,rn1-rn2 AS rnDiff
    ,ROW_NUMBER() OVER 
        (PARTITION BY ITEM, VALUE, rn1-rn2 ORDER BY dt) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM CTE
ORDER BY ITEM, dt;

Result

结果

+-------+------------+-------+--------+----------------------+
| ITEM  |     dt     | VALUE | rnDiff | NUM_FLAT_ENTRYPOINTS |
+-------+------------+-------+--------+----------------------+
| ITEM1 | 2016-05-04 |     1 |      0 |                    0 |
| ITEM1 | 2016-05-05 |     3 |      1 |                    0 |
| ITEM1 | 2016-05-06 |     3 |      1 |                    1 |
| ITEM1 | 2016-05-09 |     3 |      1 |                    2 |
| ITEM1 | 2016-05-10 |     4 |      4 |                    0 |
| ITEM2 | 2016-05-04 |     1 |      0 |                    0 |
| ITEM2 | 2016-05-05 |     2 |      1 |                    0 |
| ITEM2 | 2016-05-06 |     3 |      2 |                    0 |
| ITEM2 | 2016-05-09 |     1 |      2 |                    0 |
| ITEM2 | 2016-05-10 |     1 |      2 |                    1 |
+-------+------------+-------+--------+----------------------+

#2


1  

Assuming the correction to the sample data I suggested in the comments, this seems to fit the bill:

假设对我在评论中建议的样本数据进行了修正,这似乎符合要求:

declare @t table (ITEM char(5), Date date, Value tinyint)
insert into @t(ITEM,DATE,VALUE) values
('ITEM1','20160504',1),
('ITEM1','20160505',3),
('ITEM1','20160506',3),
('ITEM1','20160509',3),
('ITEM1','20160510',4),
('ITEM2','20160504',1),
('ITEM2','20160505',2),
('ITEM2','20160506',3),
('ITEM2','20160509',1),
('ITEM2','20160510',1)

;With Ordered as (
    select
        Item,
        Date,
        Value,
        ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Date) as rn
    from @t
)
select
    *,
    COALESCE(rn -
        (select MAX(o2.rn) from Ordered o2
        where o2.ITEM = o.ITEM and
            o2.rn < o.rn and
            o2.Value != o.Value) - 1
    , o.rn - 1) as NUM_FLAT_ENTRYPOINTS
from
    Ordered o

That is, we assign row numbers (separately for each item), and then we simply find the latest row number earlier than the current one where Value is different. Subtracting these row numbers (and a further 1) produces the answer we need - assuming such an earlier row can be found. If there's no such earlier row then we're obviously in a sequence that's at the start for a particular item - so we just subtract 1 from the row number.

也就是说,我们为每一项分配行号(分别对应每一项),然后我们只需要在当前行号的前面找到最新的行号,而当前行号的值是不同的。减去这些行号(再减去1)就得到了我们需要的答案——假设可以找到这么早的行。如果没有这么早的行,那么我们显然是在一个特定项的开始的序列中——所以我们从行号中减去1。

I've gone for "obviously correct" here - it's possible that there's a way to produce the result that may perform better but I'm not aiming for that right now.

我在这里选择了“明显正确”——有可能有一种方法可以产生更好的结果,但我现在并没有这个目标。

Results:

结果:

Item  Date       Value rn                   NUM_FLAT_ENTRYPOINTS
----- ---------- ----- -------------------- --------------------
ITEM1 2016-05-04 1     1                    0
ITEM1 2016-05-05 3     2                    0
ITEM1 2016-05-06 3     3                    1
ITEM1 2016-05-09 3     4                    2
ITEM1 2016-05-10 4     5                    0
ITEM2 2016-05-04 1     1                    0
ITEM2 2016-05-05 2     2                    0
ITEM2 2016-05-06 3     3                    0
ITEM2 2016-05-09 1     4                    0
ITEM2 2016-05-10 1     5                    1

#3


1  

Try this:

试试这个:

SELECT ITEM, [DATE], VALUE,
       ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE, grp 
                          ORDER BY [DATE]) - 1 AS NUM_FLAT_ENTRYPOINTS 
FROM (
SELECT ITEM, [DATE], VALUE,
       ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY [DATE]) - 
       ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY [DATE]) AS grp
FROM mytable) AS t

#1


1  

It looks like a variation of gaps-and-islands.

它看起来就像一个裂缝和岛屿的变种。

Sample data

样本数据

DECLARE @T TABLE (ITEM varchar(50), dt date, VALUE int);
INSERT INTO @T(ITEM, dt, VALUE) VALUES
('ITEM1', '2016-05-04', 1),
('ITEM1', '2016-05-05', 3),
('ITEM1', '2016-05-06', 3),
('ITEM1', '2016-05-09', 3),
('ITEM1', '2016-05-10', 4),
('ITEM2', '2016-05-04', 1),
('ITEM2', '2016-05-05', 2),
('ITEM2', '2016-05-06', 3),
('ITEM2', '2016-05-09', 1),
('ITEM2', '2016-05-10', 1);

Query

查询

WITH
CTE
AS
(
    SELECT
        ITEM
        ,dt
        ,VALUE
        ,ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY dt) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY dt) AS rn2
    FROM @T
)
SELECT
    ITEM
    ,dt
    ,VALUE
    ,rn1-rn2 AS rnDiff
    ,ROW_NUMBER() OVER 
        (PARTITION BY ITEM, VALUE, rn1-rn2 ORDER BY dt) - 1 AS NUM_FLAT_ENTRYPOINTS
FROM CTE
ORDER BY ITEM, dt;

Result

结果

+-------+------------+-------+--------+----------------------+
| ITEM  |     dt     | VALUE | rnDiff | NUM_FLAT_ENTRYPOINTS |
+-------+------------+-------+--------+----------------------+
| ITEM1 | 2016-05-04 |     1 |      0 |                    0 |
| ITEM1 | 2016-05-05 |     3 |      1 |                    0 |
| ITEM1 | 2016-05-06 |     3 |      1 |                    1 |
| ITEM1 | 2016-05-09 |     3 |      1 |                    2 |
| ITEM1 | 2016-05-10 |     4 |      4 |                    0 |
| ITEM2 | 2016-05-04 |     1 |      0 |                    0 |
| ITEM2 | 2016-05-05 |     2 |      1 |                    0 |
| ITEM2 | 2016-05-06 |     3 |      2 |                    0 |
| ITEM2 | 2016-05-09 |     1 |      2 |                    0 |
| ITEM2 | 2016-05-10 |     1 |      2 |                    1 |
+-------+------------+-------+--------+----------------------+

#2


1  

Assuming the correction to the sample data I suggested in the comments, this seems to fit the bill:

假设对我在评论中建议的样本数据进行了修正,这似乎符合要求:

declare @t table (ITEM char(5), Date date, Value tinyint)
insert into @t(ITEM,DATE,VALUE) values
('ITEM1','20160504',1),
('ITEM1','20160505',3),
('ITEM1','20160506',3),
('ITEM1','20160509',3),
('ITEM1','20160510',4),
('ITEM2','20160504',1),
('ITEM2','20160505',2),
('ITEM2','20160506',3),
('ITEM2','20160509',1),
('ITEM2','20160510',1)

;With Ordered as (
    select
        Item,
        Date,
        Value,
        ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Date) as rn
    from @t
)
select
    *,
    COALESCE(rn -
        (select MAX(o2.rn) from Ordered o2
        where o2.ITEM = o.ITEM and
            o2.rn < o.rn and
            o2.Value != o.Value) - 1
    , o.rn - 1) as NUM_FLAT_ENTRYPOINTS
from
    Ordered o

That is, we assign row numbers (separately for each item), and then we simply find the latest row number earlier than the current one where Value is different. Subtracting these row numbers (and a further 1) produces the answer we need - assuming such an earlier row can be found. If there's no such earlier row then we're obviously in a sequence that's at the start for a particular item - so we just subtract 1 from the row number.

也就是说,我们为每一项分配行号(分别对应每一项),然后我们只需要在当前行号的前面找到最新的行号,而当前行号的值是不同的。减去这些行号(再减去1)就得到了我们需要的答案——假设可以找到这么早的行。如果没有这么早的行,那么我们显然是在一个特定项的开始的序列中——所以我们从行号中减去1。

I've gone for "obviously correct" here - it's possible that there's a way to produce the result that may perform better but I'm not aiming for that right now.

我在这里选择了“明显正确”——有可能有一种方法可以产生更好的结果,但我现在并没有这个目标。

Results:

结果:

Item  Date       Value rn                   NUM_FLAT_ENTRYPOINTS
----- ---------- ----- -------------------- --------------------
ITEM1 2016-05-04 1     1                    0
ITEM1 2016-05-05 3     2                    0
ITEM1 2016-05-06 3     3                    1
ITEM1 2016-05-09 3     4                    2
ITEM1 2016-05-10 4     5                    0
ITEM2 2016-05-04 1     1                    0
ITEM2 2016-05-05 2     2                    0
ITEM2 2016-05-06 3     3                    0
ITEM2 2016-05-09 1     4                    0
ITEM2 2016-05-10 1     5                    1

#3


1  

Try this:

试试这个:

SELECT ITEM, [DATE], VALUE,
       ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE, grp 
                          ORDER BY [DATE]) - 1 AS NUM_FLAT_ENTRYPOINTS 
FROM (
SELECT ITEM, [DATE], VALUE,
       ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY [DATE]) - 
       ROW_NUMBER() OVER (PARTITION BY ITEM, VALUE ORDER BY [DATE]) AS grp
FROM mytable) AS t