如何在SQL Server中合并时间间隔

Suppose I have the following an event table with personId, startDate and endDate.

假设我有以下带有personId,startDate和endDate的事件表。

I want to know how much time the person X spent doing an event (the events can override each other).

我想知道X人花了多少时间做一个事件(事件可以互相覆盖)。

If the person just has 1 event, its easy: datediff(dd, startDate, endDate)

如果这个人只有1个事件,那就简单了:datediff(dd,startDate,endDate)

If the person has 2 events it gets tricky.

如果这个人有2个事件就很棘手。

I'll set some scenarios for the expected results.

我将为预期结果设置一些场景。

Scenario 1

startDate endDate
1         4
3         5

This means he the results should be the datediff from 1 to 5

这意味着他的结果应该是1到5的日期

Scenario 2

startDate endDate
1         3
6         9

this means he the results should be the some of datediff(dd,1,3) and datediff(dd,6,9)

这意味着他的结果应该是一些日期(dd,1,3)和datediff(dd,6,9)

How can I get this result on an sql query? I can only think of a bunch of if statements, but the same person can have n events so the query will be really confusing.

如何在sql查询中获得此结果?我只能想到一堆if语句,但是同一个人可以有n个事件,所以查询会让人很困惑。

Shredder Edit: I'd like to add a 3rd scenario:

碎纸机编辑:我想添加第三种情况:

startDate endDate
1       5
4       8
11      15

Desired result to Shredder scenario:

粉碎机场景的理想结果:

(1,5) and (4,8) merge in (1,8) since they overlap then we need to datediff(1,8) + datediff(11,15) => 7 + 4 => 11

(1,5)和(4,8)合并在(1,8)因为它们重叠然后我们需要dateiff(1,8)+ datediff(11,15)=> 7 + 4 => 11

7 个解决方案

#1

You can use a recursive CTE to build a list of dates and then count the distinct dates.

您可以使用递归CTE来构建日期列表,然后计算不同的日期。

declare @T table
(
  startDate date,
  endDate date
);

insert into @T values
('2011-01-01', '2011-01-05'),
('2011-01-04', '2011-01-08'),
('2011-01-11', '2011-01-15');

with C as
(
  select startDate,
         endDate
  from @T
  union all
  select dateadd(day, 1, startDate),
         endDate
  from C
  where dateadd(day, 1, startDate) < endDate       
)
select count(distinct startDate) as DayCount
from C
option (MAXRECURSION 0)

Result:

DayCount
-----------
11

Or you can use a numbers table. Here I use master..spt_values:

或者您可以使用数字表。这里我使用master..spt_values:

declare @MinStartDate date
select @MinStartDate = min(startDate)
from @T

select count(distinct N.number)
from @T as T
  inner join master..spt_values as N
    on dateadd(day, N.Number, @MinStartDate) between T.startDate and dateadd(day, -1, T.endDate)
where N.type = 'P'

#2

Here's a solution that uses the Tally table idea (which I first heard of in an article by Itzk Ben-Gan -- I still cut and paste his code whenver the subject comes up). The idea is to generate a list of ascending integers, join the source data by range against the numbers, and then count the number of distinct numbers, as follows. (This code uses syntax from SQL Server 2008, but with minor modifications would work in SQL 2005.)

这是一个使用Tally表概念的解决方案(我在Itzk Ben-Gan的一篇文章中首次听到 - 当主题出现时我仍然剪切并粘贴他的代码)。我们的想法是生成一个升序整数列表,按照数字的范围连接源数据,然后计算不同数字的数量,如下所示。 (此代码使用SQL Server 2008中的语法,但只需稍加修改即可在SQL 2005中使用。)

First set up some testing data:

首先设置一些测试数据:

CREATE TABLE #EventTable
 (
   PersonId   int  not null
  ,startDate  datetime  not null
  ,endDate    datetime  not null
 )

INSERT #EventTable
 values (1, 'Jan 1, 2011', 'Jan 4, 2011')
       ,(1, 'Jan 3, 2011', 'Jan 5, 2011')
       ,(2, 'Jan 1, 2011', 'Jan 3, 2011')
       ,(2, 'Jan 6, 2011', 'Jan 9, 2011')

Determine some initial values

确定一些初始值

DECLARE @Interval bigint ,@FirstDay datetime ,@PersonId int = 1 -- (or whatever)

DECLARE @Interval bigint,@ FirstDay datetime,@ PersonId int = 1 - (或其他)

Get the first day and the maximum possible number of dates (to keep the cte from generating extra values):

获取第一天和最大可能日期数(以保持cte不生成额外值):

SELECT
   @Interval = datediff(dd, min(startDate), max(endDate)) + 1
  ,@FirstDay = min(startDate)
 from #EventTable
 where PersonId = @PersonId

Cut and paste over the one routine and modify and test it to only return as many integers as we'll need:

剪切并粘贴一个例程并修改并测试它只返回我们需要的整数:

/*
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
 select Number from Tally where Number <= @Interval
*/

And now revise it by first joining to the intervals defined in each source row, and then count each distinct value found:

现在通过首先连接到每个源行中定义的间隔来修改它,然后计算找到的每个不同的值:

;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
SELECT PersonId, count(distinct Number) EventDays
 from #EventTable et
  inner join Tally
   on dateadd(dd, Tally.Number - 1, @FirstDay) between et.startDate and et.endDate
 where et.PersonId = @PersonId
  and Number <= @Interval
 group by PersonId

Take out the @PersonId filter and you'd get it for all persons. And with minor modification you can do it for any time interval, not just days (which is why I set the Tally table to generate severely large numbers.)

取出@PersonId过滤器,您就可以获得所有人的帮助。通过微小的修改,您可以在任何时间间隔内进行,而不仅仅是几天(这就是为什么我将Tally表设置为生成非常大的数字。)

#3

The following SQL is for the three scenarios you've described

以下SQL适用于您所描述的三种方案

with sampleData 
AS (


    SELECT       1 personid,1 startDate,4 endDate
    UNION SELECT 1,3,5
    UNION SELECT 2,1,3
    UNION SELECT 2,6,9
    UNION SELECT 3,1,5 
    UNION SELECT 3,4,8
    UNION SELECT 3,11, 15

), 
     cte 
     AS (SELECT personid, 
                startdate, 
                enddate, 
                Row_number() OVER(ORDER BY personid, startdate) AS rn 
         FROM   sampledata), 
     overlaps 
     AS (SELECT a.personid, 
                a.startdate, 
                b.enddate, 
                a.rn id1, 
                b.rn id2 
         FROM   cte a 
                INNER JOIN cte b 
                  ON a.personid = b.personid 
                     AND a.enddate > b.startdate 
                     AND a.rn = b.rn - 1), 
     nooverlaps 
     AS (SELECT a.personid, 
                a.startdate, 
                a.enddate 
         FROM   cte a 
                LEFT JOIN overlaps b 
                  ON a.rn = b.id1 
                      OR a.rn = b.id2 
         WHERE  b.id1 IS NULL) 
SELECT personid, 
       SUM(timespent) timespent 
FROM   (SELECT personid, 
               enddate - startdate timespent 
        FROM   nooverlaps 
        UNION 
        SELECT personid, 
               enddate - startdate 
        FROM   overlaps) t 
GROUP  BY personid

Produces this result

产生这个结果

Personid    timeSpent
----------- -----------
1           4
2           5
3           11

Notes: I used the simple integers but the DateDiffs should work too

注意:我使用了简单的整数,但DateDiffs也应该工作

Correctness issue There is a correctness issue if your data is allowed to have multiple overlaps as Cheran S noted, the results won't be correct and you should use one of the other answers instead. His example used [1,5],[4,8],[7,11] for the same person ID

正确性问题如果您的数据被允许有多个重叠,则会出现正确性问题,如Cheran S所述,结果将不正确,您应该使用其他答案之一。他的例子使用[1,5],[4,8],[7,11]作为同一个人ID

#4

Algebra. If B-n is the ending time of the nth event, and A-n is the starting time of the nth event, then the sum of the differences is the difference of the sums. So you can write

代数。如果B-n是第n个事件的结束时间,并且A-n是第n个事件的开始时间,则差值的总和是总和的差值。所以你可以写

select everything else, sum(cast(endDate as int)) - sum(cast(startDate as int)) as daysSpent

If your dates have no time component, this works. Otherwise, you could use a real.

如果您的日期没有时间组件,则此方法有效。否则,你可以使用真实的。

#5

Try something like this

尝试这样的事情

select 
    personId, 
    sum(DateDuration) as TotalDuration
from
(
    select personId, datediff(dd, startDate, endDate) as DateDuration
    from yourEventTable
) a
group by personId

#6

;WITH cte(gap)
AS
(
    SELECT sum(b-a) from xxx GROUP BY uid
)

SELECT * FROM cte

#7

-1

Edit 1: I have modified both solutions to get correct results.

编辑1:我修改了两个解决方案以获得正确的结果。

Edit 2: I have done comparative tests using the solutions proposed by Mikael Eriksson, Conrad Frix, Philip Kelley and me. All tests use an EventTable with the following structure:

编辑2:我使用Mikael Eriksson,Conrad Frix,Philip Kelley和我提出的解决方案进行了对比测试。所有测试都使用具有以下结构的EventTable:

CREATE TABLE EventTable
(
     EventID    INT IDENTITY PRIMARY KEY
    ,PersonId   INT NOT NULL
    ,StartDate  DATETIME NOT NULL
    ,EndDate    DATETIME NOT NULL
    ,CONSTRAINT CK_StartDate_Before_EndDate CHECK(StartDate < EndDate)
);

Also, all tests use warm buffer (no DBCC DROPCLEANBUFFERS) and cold [plan] cache (I have executed DBCC FREEPROCCACHE before every test). Because some solutions use a filter(PersonId = 1) and others not, I have inserted into EventTable rows for only one person (INSERT ...(PersonId,...) VALUES (1,...)).

此外,所有测试都使用暖缓冲区(无DBCC DROPCLEANBUFFERS)和冷[plan]缓存(我在每次测试之前都执行了DBCC FREEPROCCACHE)。因为有些解决方案使用过滤器(PersonId = 1)而其他解决方案没有,我只为一个人插入了EventTable行(INSERT ...(PersonId,...)VALUES(1,...))。

These are the results: 如何在SQL Server中合并时间间隔

结果如下:

My solutions use recursive CTEs.

我的解决方案使用递归CTE。

Solution 1:

WITH BaseCTE
AS
(
    SELECT   e.StartDate
            ,e.EndDate
            ,e.PersonId
            ,ROW_NUMBER() OVER(PARTITION BY e.PersonId ORDER BY e.StartDate, e.EndDate) RowNumber
    FROM    EventTable e
),  RecursiveCTE
AS
(
    SELECT   b.PersonId
            ,b.RowNumber

            ,b.StartDate
            ,b.EndDate
            ,b.EndDate AS MaxEndDate
            ,1 AS PseudoDenseRank
    FROM    BaseCTE b
    WHERE   b.RowNumber = 1
    UNION ALL
    SELECT   crt.PersonId
            ,crt.RowNumber

            ,crt.StartDate
            ,crt.EndDate
            ,CASE WHEN crt.EndDate > prev.MaxEndDate THEN crt.EndDate ELSE prev.MaxEndDate END
            ,CASE WHEN crt.StartDate <= prev.MaxEndDate THEN prev.PseudoDenseRank ELSE prev.PseudoDenseRank + 1 END
    FROM    RecursiveCTE prev
    INNER JOIN BaseCTE crt ON prev.PersonId = crt.PersonId
    AND     prev.RowNumber + 1 = crt.RowNumber
),  SumDaysPerPersonAndInterval
AS
(
    SELECT   src.PersonId
            ,src.PseudoDenseRank --Interval ID
            ,DATEDIFF(DAY, MIN(src.StartDate), MAX(src.EndDate)) Days
    FROM    RecursiveCTE src
    GROUP BY src.PersonId, src.PseudoDenseRank
)
SELECT  x.PersonId, SUM( x.Days ) DaysPerPerson
FROM    SumDaysPerPersonAndInterval x
GROUP BY x.PersonId
OPTION(MAXRECURSION 32767);

Solution 2:

DECLARE @Base TABLE --or a temporary table: CREATE TABLE #Base (...) 
(
     PersonID   INT NOT NULL
    ,StartDate  DATETIME NOT NULL
    ,EndDate    DATETIME NOT NULL
    ,RowNumber  INT NOT NULL
    ,PRIMARY KEY(PersonID, RowNumber)
);
INSERT  @Base (PersonID, StartDate, EndDate, RowNumber)
SELECT   e.PersonId
        ,e.StartDate
        ,e.EndDate
        ,ROW_NUMBER() OVER(PARTITION BY e.PersonID ORDER BY e.StartDate, e.EndDate) RowNumber
FROM    EventTable e;

WITH RecursiveCTE
AS
(
    SELECT   b.PersonId
            ,b.RowNumber

            ,b.StartDate
            ,b.EndDate
            ,b.EndDate AS MaxEndDate
            ,1 AS PseudoDenseRank
    FROM    @Base b
    WHERE   b.RowNumber = 1
    UNION ALL
    SELECT   crt.PersonId
            ,crt.RowNumber

            ,crt.StartDate
            ,crt.EndDate
            ,CASE WHEN crt.EndDate > prev.MaxEndDate THEN crt.EndDate ELSE prev.MaxEndDate END
            ,CASE WHEN crt.StartDate <= prev.MaxEndDate THEN prev.PseudoDenseRank ELSE prev.PseudoDenseRank + 1 END
    FROM    RecursiveCTE prev
    INNER JOIN @Base crt ON prev.PersonId = crt.PersonId
    AND     prev.RowNumber + 1 = crt.RowNumber
),  SumDaysPerPersonAndInterval
AS
(
    SELECT   src.PersonId
            ,src.PseudoDenseRank --Interval ID
            ,DATEDIFF(DAY, MIN(src.StartDate), MAX(src.EndDate)) Days
    FROM    RecursiveCTE src
    GROUP BY src.PersonId, src.PseudoDenseRank
)
SELECT  x.PersonId, SUM( x.Days ) DaysPerPerson
FROM    SumDaysPerPersonAndInterval x
GROUP BY x.PersonId
OPTION(MAXRECURSION 32767);

#1