I have a table that looks like the following. What I want is the the rows in continuation of each other to be grouped together - for each "ID". The column IsContinued marks if the next row should be combined with the current row
我有一个如下所示的表格。我想要的是将彼此连续的行组合在一起 - 对于每个“ID”。列IsContinued标记下一行是否应与当前行组合
My data looks like this:
我的数据如下所示:
+-----+--------+-------------+-----------+----------+
| ID | Period | IsContinued | StartDate | EndDate |
+-----+--------+-------------+-----------+----------+
| 123 | 1 | 1 | 20180101 | 20180404 |
+-----+--------+-------------+-----------+----------+
| 123 | 2 | 1 | 20180501 | 20180910 |
+-----+--------+-------------+-----------+----------+
| 123 | 3 | 0 | 20181001 | 20181201 |
+-----+--------+-------------+-----------+----------+
| 123 | 4 | 1 | 20190105 | 20190228 |
+-----+--------+-------------+-----------+----------+
| 123 | 5 | 0 | 20190401 | 20190430 |
+-----+--------+-------------+-----------+----------+
| 456 | 2 | 1 | 20180201 | 20180215 |
+-----+--------+-------------+-----------+----------+
| 456 | 3 | 0 | 20180301 | 20180401 |
+-----+--------+-------------+-----------+----------+
| 456 | 4 | 0 | 20180501 | 20180530 |
+-----+--------+-------------+-----------+----------+
| 456 | 5 | 0 | 20180701 | 20180705 |
+-----+--------+-------------+-----------+----------+
The end result I want is this:
我想要的最终结果是:
+-----+-------------+-----------+-----------+----------+
| ID | PeriodStart | PeriodEnd | StartDate | EndDate |
+-----+-------------+-----------+-----------+----------+
| 123 | 1 | 3 | 20180101 | 20181201 |
+-----+-------------+-----------+-----------+----------+
| 123 | 4 | 5 | 20190105 | 20190430 |
+-----+-------------+-----------+-----------+----------+
| 456 | 2 | 3 | 20180201 | 20180401 |
+-----+-------------+-----------+-----------+----------+
| 456 | 4 | 4 | 20180501 | 20180530 |
+-----+-------------+-----------+-----------+----------+
| 456 | 5 | 5 | 20180701 | 20180705 |
+-----+-------------+-----------+-----------+----------+
DDL Statement:
CREATE TABLE #Period (ID INT, PeriodNr INT, IsContinued INT, STARTDATE DATE, ENDDATE DATE)
INSERT INTO #Period VALUES (123,1,1,'20180101', '20180404'),
(123,2,1,'20180501', '20180910'),
(123,3,0,'20181001', '20181201'),
(123,4,1,'20190105', '20190228'),
(123,5,0,'20190401', '20190430'),
(456,2,1,'20180201', '20180215'),
(456,3,0,'20180301', '20180401'),
(456,4,0,'20180501', '20180530'),
(456,5,0,'20180701', '20180705')
The code should be run on SQL Server 2016
代码应该在SQL Server 2016上运行
Thanks!
1 个解决方案
#1
4
Here is one approach:
这是一种方法:
with removeFluff as
(
SELECT *
FROM (
SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
FROM #Period
) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)
,getValues as
(
SELECT ID,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
PeriodNr PeriodEnd,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
EndDate,
IsContinued
FROM removeFluff r
)
SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0
Output:
ID PeriodStart PeriodEnd StartDate EndDate
123 1 3 2018-01-01 2018-12-01
123 4 5 2019-01-05 2019-04-30
456 2 3 2018-02-01 2018-04-01
456 4 4 2018-05-01 2018-05-30
456 5 5 2018-07-01 2018-07-05
Method:
-
removeFluff
cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data) - Now that the fluff is removed, we know that either:
- A.) The line is complete on it's own (
LAG(IsContinued) ... = 0
), ie. previous line is complete - B.) The line needs the "start" info from the previous line (
LAG(IsContinued) ... = 1
) - We apply these two cases in the
CASE
expression of thegetValues
cte - Last, the results are narrowed to only the important rows in the final select with
IsContinued = 0
. This is because we have usedLAG
to get "start" data on the "end" data row, so we only want to select the end rows
removeFluff cte删除不重要的行。这些是不开始或结束片段的记录(样本数据中的第2行)
现在除去了绒毛,我们知道:
A.)该线自己完成(LAG(IsContinued)... = 0),即。上一行已完成
B.)该行需要前一行的“开始”信息(LAG(IsContinued)... = 1)
我们将这两种情况应用于getValues cte的CASE表达式
最后,结果被缩小到只有IsContinued = 0的最终选择中的重要行。这是因为我们使用LAG在“end”数据行上获取“start”数据,所以我们只想选择结束行
#1
4
Here is one approach:
这是一种方法:
with removeFluff as
(
SELECT *
FROM (
SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
FROM #Period
) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)
,getValues as
(
SELECT ID,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
PeriodNr PeriodEnd,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
EndDate,
IsContinued
FROM removeFluff r
)
SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0
Output:
ID PeriodStart PeriodEnd StartDate EndDate
123 1 3 2018-01-01 2018-12-01
123 4 5 2019-01-05 2019-04-30
456 2 3 2018-02-01 2018-04-01
456 4 4 2018-05-01 2018-05-30
456 5 5 2018-07-01 2018-07-05
Method:
-
removeFluff
cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data) - Now that the fluff is removed, we know that either:
- A.) The line is complete on it's own (
LAG(IsContinued) ... = 0
), ie. previous line is complete - B.) The line needs the "start" info from the previous line (
LAG(IsContinued) ... = 1
) - We apply these two cases in the
CASE
expression of thegetValues
cte - Last, the results are narrowed to only the important rows in the final select with
IsContinued = 0
. This is because we have usedLAG
to get "start" data on the "end" data row, so we only want to select the end rows
removeFluff cte删除不重要的行。这些是不开始或结束片段的记录(样本数据中的第2行)
现在除去了绒毛,我们知道:
A.)该线自己完成(LAG(IsContinued)... = 0),即。上一行已完成
B.)该行需要前一行的“开始”信息(LAG(IsContinued)... = 1)
我们将这两种情况应用于getValues cte的CASE表达式
最后,结果被缩小到只有IsContinued = 0的最终选择中的重要行。这是因为我们使用LAG在“end”数据行上获取“start”数据,所以我们只想选择结束行