如何使用任意表达式划分/窗口日期有序事件?

时间:2021-06-23 22:55:21

I would like to group some data together based on dates and some (potentially arbitrary) indicator:

我想根据日期和一些(可能是任意的)指标将一些数据分组在一起:

Date       | Ind
================
2016-01-02 | 1
2016-01-03 | 5
2016-03-02 | 10
2016-03-05 | 15
2016-05-10 | 6
2016-05-11 | 2

I would like to group together subsequent (date-ordered) rows but breaking the group after Indicator >= 10:

我想将后续的(日期顺序的)行分组在一起,但在指示器>= 10之后分组:

Date       | Ind | Group
========================
2016-01-02 | 1   |   1
2016-01-03 | 5   |   1
2016-03-02 | 10  |   1

2016-03-05 | 15  |   2

2016-05-10 | 6   |   3
2016-05-11 | 2   |   3

I did find a promising technique at the end of a blog post: "Use this Neat Window Function Trick to Calculate Time Differences in a Time Series" (the final subsection, "Extra Bonus"), but the important part of the query uses a keyword (FILTER) that doesn't seem to be supported in SQL Server (and a quick Google later and I'm not sure where it is supported!).

我找到了一个有前途的技术在一篇博文:“使用这个整洁的窗口函数方法来计算时间差异在一个时间序列”(最后一小节,“额外奖金”),但是查询的重要部分使用一个关键字(过滤器)似乎并不支持SQL Server(和谷歌之后,我不确定它在哪里支持!)。

I'm still hopeful a technique using a window function might be the answer. I just need a counter that I can add to every row, (like RANK or ROW_NUMBER does) but that only increments when some arbitrary condition evaluates as true. Is there a way to do this in SQL Server?

我仍然希望使用窗口函数的技术可能是答案。我只需要一个计数器,我可以将它添加到每一行(比如RANK或ROW_NUMBER),但它只在任意条件计算为true时才会增加。在SQL Server中有这样的方法吗?

2 个解决方案

#1


3  

Here is the solution:

这是解决方案:

DECLARE @t TABLE ([Date] DATETIME, Ind INT)

INSERT INTO @t 
VALUES
('2016-01-02', 1),
('2016-01-03', 5),
('2016-03-02', 10),
('2016-03-05', 15),
('2016-05-10', 6),
('2016-05-11', 2)

SELECT [Date],
       Ind,
       1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM 
(
    SELECT  *, 
            CASE WHEN LAG(ind) OVER(ORDER BY [Date]) >= 10 
                THEN 1 
                ELSE 0 
            END AS [Group] 
      FROM @t
) t

Just mark row as 1 when previous is greater than 10 else 0. Then a running sum will give you the desired result.

当前面的行大于10时,将行标记为1。然后一个运行的和将会给你想要的结果。

#2


1  

Giving full credit to Giorgi for the idea, but I've modified his answer (both for my benefit and for future readers).

我完全相信Giorgi的想法,但我已经修改了他的答案(这对我和未来的读者都有好处)。

Just change the CASE statement to see if 30 or more days have lapsed since the last record:

只需要更改CASE语句,看看自上次记录以来的30天或更长时间是否已经失效:

DECLARE @t TABLE ([Date] DATETIME)

INSERT INTO @t 
VALUES
('2016-01-02'),
('2016-01-03'),
('2016-03-02'),
('2016-03-05'),
('2016-05-10'),
('2016-05-11')

SELECT [Date],
       1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM 
(
    SELECT  [Date], 
            CASE WHEN DATEADD(d, -30, [Date]) >= LAG([Date]) OVER(ORDER BY [Date])
                THEN 1 
                ELSE 0 
            END AS [Group] 
      FROM @t
) t

#1


3  

Here is the solution:

这是解决方案:

DECLARE @t TABLE ([Date] DATETIME, Ind INT)

INSERT INTO @t 
VALUES
('2016-01-02', 1),
('2016-01-03', 5),
('2016-03-02', 10),
('2016-03-05', 15),
('2016-05-10', 6),
('2016-05-11', 2)

SELECT [Date],
       Ind,
       1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM 
(
    SELECT  *, 
            CASE WHEN LAG(ind) OVER(ORDER BY [Date]) >= 10 
                THEN 1 
                ELSE 0 
            END AS [Group] 
      FROM @t
) t

Just mark row as 1 when previous is greater than 10 else 0. Then a running sum will give you the desired result.

当前面的行大于10时,将行标记为1。然后一个运行的和将会给你想要的结果。

#2


1  

Giving full credit to Giorgi for the idea, but I've modified his answer (both for my benefit and for future readers).

我完全相信Giorgi的想法,但我已经修改了他的答案(这对我和未来的读者都有好处)。

Just change the CASE statement to see if 30 or more days have lapsed since the last record:

只需要更改CASE语句,看看自上次记录以来的30天或更长时间是否已经失效:

DECLARE @t TABLE ([Date] DATETIME)

INSERT INTO @t 
VALUES
('2016-01-02'),
('2016-01-03'),
('2016-03-02'),
('2016-03-05'),
('2016-05-10'),
('2016-05-11')

SELECT [Date],
       1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM 
(
    SELECT  [Date], 
            CASE WHEN DATEADD(d, -30, [Date]) >= LAG([Date]) OVER(ORDER BY [Date])
                THEN 1 
                ELSE 0 
            END AS [Group] 
      FROM @t
) t