如何在窗口中合并重叠时间?

时间:2021-07-11 19:15:07

I have a table like this:

我有这样一张桌子:

CREATE TABLE #TEMP (Name VARCHAR(255), START_TIME datetime, END_TIME datetime);

INSERT INTO #TEMP VALUES('John', '2012-01-01 09:00:01', '2012-01-01 12:00:02')
INSERT INTO #TEMP VALUES('John', '2012-01-01 09:40:01', '2012-01-01 11:00:02')
INSERT INTO #TEMP VALUES('John', '2012-01-02 05:00:01', '2012-01-02 05:15:02')
INSERT INTO #TEMP VALUES('David', '2012-01-04 05:00:01', '2012-01-04 05:15:02')
INSERT INTO #TEMP VALUES('David', '2012-01-05 07:01:01', '2012-01-05 15:15:02')

SELECT *
FROM #TEMP

DROP TABLE #TEMP

And the data is:

数据是:

     Name   START_TIME                 END_TIME
1    John   2012-01-01 09:00:01.000    2012-01-01 12:00:02.000
2    John   2012-01-01 09:40:01.000    2012-01-01 11:00:02.000
3    John   2012-01-02 05:00:01.000    2012-01-02 05:15:02.000
4    David  2012-01-04 05:00:01.000    2012-01-04 05:15:02.000
5    David  2012-01-05 07:01:01.000    2012-01-05 08:15:02.000

Given a number say, 6, I am trying to do a GROUP BY on this table and merge times that overlap within a window of 6 hours before and after. Therefore, in the above table, rows 1 and 2 would be merged into a single row as they contain overlapping time range:

给出一个数字,6,我试图在这个表上进行GROUP BY并合并在6小时之前和之后的窗口内重叠的时间。因此,在上表中,第1行和第2行将合并为单行,因为它们包含重叠的时间范围:

John 2012-01-01 06:00:01.000 2012-01-01 18:00:02.000

Rows 4 and 5 will be merged because subtracting 6 hours from 07:01:01.000 falls into the window of row 4.

第4行和第5行将合并,因为从07:01:01.000减去6小时后将进入第4行的窗口。

Is there a good way of doing this on a large table containing about a million rows?

有一个很好的方法在一个包含大约一百万行的大表上执行此操作吗?

1 个解决方案

#1


2  

I think that the best way to do this is creating a windows table and join #temp table with this new window table:

我认为最好的方法是创建一个Windows表并使用这个新窗口表连接#temp表:

1) Step 1, preparing window table with all possible windows gaps (contains overlaping windows):

1)步骤1,准备窗口表,其中包含所有可能的窗口间隙(包含重叠的窗口):

   SELECT 
      Name,
      dateadd(hour, -6, start_time) as start_w, 
      dateadd(hour, +6, start_time) as end_w
   into #possible_windows
   FROM #TEMP 

2) Create an index on temp table to improve performance

2)在临时表上创建索引以提高性能

   create index pw_idx on #possible_windows ( Name, start_w)

3) Eliminate overlaping windows in a self join select. This is the reason to create the index:

3)消除自连接选择中的重叠窗口。这是创建索引的原因:

   select p2.* 
   into #myWindows
   from #possible_windows p1
   right outer join #possible_windows p2
     on p1.name = p2.name and 
        p2.start_w > p1.start_W and p2.start_w <= p1.end_w
   where p1.name is null

4) Join your table with #myWindows or use it directly.

4)用#myWindows加入你的桌子或直接使用它。

WORKING:

加工:

SELECT 
  Name,
  dateadd(hour, -6, start_time) as start_w, 
  dateadd(hour, +6, start_time) as end_w,
  ROW_NUMBER() over(partition by Name order by Name, 
                    dateadd(hour, -6, start_time) ) as rn
into #possible_windows
FROM #TEMP 

create index pw_idx on #possible_windows ( Name, start_w)

select p2.* 
from #possible_windows p1
right outer join #possible_windows p2
  on p1.name = p2.name and 
     p2.start_w > p1.start_W and p2.start_w <= p1.end_w
where p1.name is null

RESULTS:

结果:

Name  start_w       end_w         rn 
----- ------------- ------------- -- 
David 2012-01-03 23:00:012012-01-04 11:00:011  
David 2012-01-05 01:01:012012-01-05 13:01:012  
John  2012-01-01 03:00:012012-01-01 15:00:011  
John  2012-01-01 23:00:012012-01-02 11:00:013 

PE: Please, go back with your performance tests!

PE:请回去进行性能测试!

#1


2  

I think that the best way to do this is creating a windows table and join #temp table with this new window table:

我认为最好的方法是创建一个Windows表并使用这个新窗口表连接#temp表:

1) Step 1, preparing window table with all possible windows gaps (contains overlaping windows):

1)步骤1,准备窗口表,其中包含所有可能的窗口间隙(包含重叠的窗口):

   SELECT 
      Name,
      dateadd(hour, -6, start_time) as start_w, 
      dateadd(hour, +6, start_time) as end_w
   into #possible_windows
   FROM #TEMP 

2) Create an index on temp table to improve performance

2)在临时表上创建索引以提高性能

   create index pw_idx on #possible_windows ( Name, start_w)

3) Eliminate overlaping windows in a self join select. This is the reason to create the index:

3)消除自连接选择中的重叠窗口。这是创建索引的原因:

   select p2.* 
   into #myWindows
   from #possible_windows p1
   right outer join #possible_windows p2
     on p1.name = p2.name and 
        p2.start_w > p1.start_W and p2.start_w <= p1.end_w
   where p1.name is null

4) Join your table with #myWindows or use it directly.

4)用#myWindows加入你的桌子或直接使用它。

WORKING:

加工:

SELECT 
  Name,
  dateadd(hour, -6, start_time) as start_w, 
  dateadd(hour, +6, start_time) as end_w,
  ROW_NUMBER() over(partition by Name order by Name, 
                    dateadd(hour, -6, start_time) ) as rn
into #possible_windows
FROM #TEMP 

create index pw_idx on #possible_windows ( Name, start_w)

select p2.* 
from #possible_windows p1
right outer join #possible_windows p2
  on p1.name = p2.name and 
     p2.start_w > p1.start_W and p2.start_w <= p1.end_w
where p1.name is null

RESULTS:

结果:

Name  start_w       end_w         rn 
----- ------------- ------------- -- 
David 2012-01-03 23:00:012012-01-04 11:00:011  
David 2012-01-05 01:01:012012-01-05 13:01:012  
John  2012-01-01 03:00:012012-01-01 15:00:011  
John  2012-01-01 23:00:012012-01-02 11:00:013 

PE: Please, go back with your performance tests!

PE:请回去进行性能测试!