T-SQL连续组合行

时间:2021-01-10 14:20:45

I have a table that looks like the following. What I want is the the rows in continuation of each other to be grouped together - for each "ID". The column IsContinued marks if the next row should be combined with the current row

我有一个如下所示的表格。我想要的是将彼此连续的行组合在一起 - 对于每个“ID”。列IsContinued标记下一行是否应与当前行组合

My data looks like this:

我的数据如下所示:

+-----+--------+-------------+-----------+----------+
| ID  | Period | IsContinued | StartDate | EndDate  |
+-----+--------+-------------+-----------+----------+
| 123 | 1      | 1           | 20180101  | 20180404 |
+-----+--------+-------------+-----------+----------+
| 123 | 2      | 1           | 20180501  | 20180910 |
+-----+--------+-------------+-----------+----------+
| 123 | 3      | 0           | 20181001  | 20181201 |
+-----+--------+-------------+-----------+----------+
| 123 | 4      | 1           | 20190105  | 20190228 |
+-----+--------+-------------+-----------+----------+
| 123 | 5      | 0           | 20190401  | 20190430 |
+-----+--------+-------------+-----------+----------+
| 456 | 2      | 1           | 20180201  | 20180215 |
+-----+--------+-------------+-----------+----------+
| 456 | 3      | 0           | 20180301  | 20180401 |
+-----+--------+-------------+-----------+----------+
| 456 | 4      | 0           | 20180501  | 20180530 |
+-----+--------+-------------+-----------+----------+
| 456 | 5      | 0           | 20180701  | 20180705 |
+-----+--------+-------------+-----------+----------+

The end result I want is this:

我想要的最终结果是:

+-----+-------------+-----------+-----------+----------+
| ID  | PeriodStart | PeriodEnd | StartDate | EndDate  |
+-----+-------------+-----------+-----------+----------+
| 123 | 1           | 3         | 20180101  | 20181201 |
+-----+-------------+-----------+-----------+----------+
| 123 | 4           | 5         | 20190105  | 20190430 |
+-----+-------------+-----------+-----------+----------+
| 456 | 2           | 3         | 20180201  | 20180401 |
+-----+-------------+-----------+-----------+----------+
| 456 | 4           | 4         | 20180501  | 20180530 |
+-----+-------------+-----------+-----------+----------+
| 456 | 5           | 5         | 20180701  | 20180705 |
+-----+-------------+-----------+-----------+----------+

DDL Statement:

CREATE TABLE #Period (ID INT, PeriodNr INT, IsContinued INT, STARTDATE DATE, ENDDATE DATE)
INSERT INTO #Period VALUES (123,1,1,'20180101', '20180404'),
                      (123,2,1,'20180501', '20180910'),
                      (123,3,0,'20181001', '20181201'),
                      (123,4,1,'20190105', '20190228'),
                      (123,5,0,'20190401', '20190430'),
                      (456,2,1,'20180201', '20180215'),
                      (456,3,0,'20180301', '20180401'),
                      (456,4,0,'20180501', '20180530'),
                      (456,5,0,'20180701', '20180705')

The code should be run on SQL Server 2016

代码应该在SQL Server 2016上运行

Thanks!

1 个解决方案

#1


4  

Here is one approach:

这是一种方法:

with removeFluff as
(
SELECT *
FROM (
        SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
        FROM #Period
     ) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)    
,getValues as
(
SELECT ID,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
       PeriodNr PeriodEnd,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
       EndDate,
       IsContinued
FROM removeFluff r
)

SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0

Output:

ID  PeriodStart PeriodEnd   StartDate   EndDate
123    1           3        2018-01-01  2018-12-01
123    4           5        2019-01-05  2019-04-30
456    2           3        2018-02-01  2018-04-01
456    4           4        2018-05-01  2018-05-30
456    5           5        2018-07-01  2018-07-05

Method:

  • removeFluff cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data)
  • removeFluff cte删除不重要的行。这些是不开始或结束片段的记录(样本数据中的第2行)

  • Now that the fluff is removed, we know that either:
  • 现在除去了绒毛,我们知道:

  • A.) The line is complete on it's own (LAG(IsContinued) ... = 0), ie. previous line is complete
  • A.)该线自己完成(LAG(IsContinued)... = 0),即。上一行已完成

  • B.) The line needs the "start" info from the previous line (LAG(IsContinued) ... = 1)
  • B.)该行需要前一行的“开始”信息(LAG(IsContinued)... = 1)

  • We apply these two cases in the CASE expression of the getValues cte
  • 我们将这两种情况应用于getValues cte的CASE表达式

  • Last, the results are narrowed to only the important rows in the final select with IsContinued = 0. This is because we have used LAG to get "start" data on the "end" data row, so we only want to select the end rows
  • 最后,结果被缩小到只有IsContinued = 0的最终选择中的重要行。这是因为我们使用LAG在“end”数据行上获取“start”数据,所以我们只想选择结束行

#1


4  

Here is one approach:

这是一种方法:

with removeFluff as
(
SELECT *
FROM (
        SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
        FROM #Period
     ) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)    
,getValues as
(
SELECT ID,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
       PeriodNr PeriodEnd,
       CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
       EndDate,
       IsContinued
FROM removeFluff r
)

SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0

Output:

ID  PeriodStart PeriodEnd   StartDate   EndDate
123    1           3        2018-01-01  2018-12-01
123    4           5        2019-01-05  2019-04-30
456    2           3        2018-02-01  2018-04-01
456    4           4        2018-05-01  2018-05-30
456    5           5        2018-07-01  2018-07-05

Method:

  • removeFluff cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data)
  • removeFluff cte删除不重要的行。这些是不开始或结束片段的记录(样本数据中的第2行)

  • Now that the fluff is removed, we know that either:
  • 现在除去了绒毛,我们知道:

  • A.) The line is complete on it's own (LAG(IsContinued) ... = 0), ie. previous line is complete
  • A.)该线自己完成(LAG(IsContinued)... = 0),即。上一行已完成

  • B.) The line needs the "start" info from the previous line (LAG(IsContinued) ... = 1)
  • B.)该行需要前一行的“开始”信息(LAG(IsContinued)... = 1)

  • We apply these two cases in the CASE expression of the getValues cte
  • 我们将这两种情况应用于getValues cte的CASE表达式

  • Last, the results are narrowed to only the important rows in the final select with IsContinued = 0. This is because we have used LAG to get "start" data on the "end" data row, so we only want to select the end rows
  • 最后,结果被缩小到只有IsContinued = 0的最终选择中的重要行。这是因为我们使用LAG在“end”数据行上获取“start”数据,所以我们只想选择结束行