计算唯一数据行之间的时间差

时间:2021-10-05 21:31:40

Suppose I have a table that has two fields. Record_Created is a datetime and Action is just a string. The data may look like this:

假设我有一个包含两个字段的表。 Record_Created是一个日期时间,Action只是一个字符串。数据可能如下所示:

Record_Created      Action
1/11/18 5:24 PM     Action 1
1/11/18 5:32 PM     Action 2
1/17/18 4:41 PM     Action 3
1/17/18 4:41 PM     Action 2
1/17/18 4:44 PM     Action 3
1/18/18 11:12 AM    Action 4
1/18/18 11:12 AM    Action 3
1/18/18 11:13 AM    Action 4
1/25/18 2:44 PM     Action 5

I need to calculate the time difference (in days) between different actions, but not just between individual rows, but rather unique actions based on the last occurrence of each action. So my result data set should look like this:

我需要计算不同操作之间的时间差(以天为单位),但不仅仅是在各行之间,而是基于每个操作的最后一次出现的唯一操作。所以我的结果数据集应如下所示:

Action  Difference
Action 2    6
Action 3    1
Action 4    0
Action 5    7

What's the best and most efficient way to achieve this considering I have over a million records in this table to go go through?

考虑到我在这张表中有超过一百万条记录要通过,最好和最有效的方法是什么?

4 个解决方案

#1


1  

If I understand correctly you want to look at the last date per action and then list the actions sorted by that date and show the time span in days from one action to the next one.

如果我理解正确,您希望查看每个操作的最后日期,然后列出按该日期排序的操作,并显示从一个操作到下一个操作的时间跨度。

So aggregate by action to get the last date and then use LAG to look into the previous record.

因此,通过操作聚合以获取最后一个日期,然后使用LAG查看上一个记录。

select 
  action, 
  max(record_created),
  date_diff(day,
    lag(max(record_created)) over (order by max(record_created)),
    max(record_created)
  ) as diff;
from actions
group by action
order by action;

This query also incudes the first action (with difference = null), but I guess you don't mind.

这个查询也包含了第一个动作(差异= null),但我想你不介意。

Rextester demo: http://rextester.com/EAA26233

Rextester演示:http://rextester.com/EAA26233

#2


3  

You could take the minimal and maximal dates for each action type and use datediff to get the number of days between them:

您可以为每种操作类型选择最小和最大日期,并使用datediff获取它们之间的天数:

SELECT   action, DATEDIFF(DAY, MIN(record_created), MAX(record_created))
FROM     mytable
GROUP BY action
HAVING   COUNT(*) > 1

#3


0  

I don't know where you are getting action 5

我不知道你在哪里采取行动5

declare @T table (dt datetime, action varchar(10));
insert into @T values 
       ('1/11/18 5:24 PM',  'Action 1')
     , ('1/11/18 5:32 PM',  'Action 2')
     , ('1/17/18 4:41 PM',  'Action 3')
     , ('1/17/18 4:41 PM',  'Action 2')
     , ('1/17/18 4:44 PM ', 'Action 3')
     , ('1/18/18 11:12 AM', 'Action 4')
     , ('1/18/18 11:12 AM', 'Action 3')
     , ('1/18/18 11:13 AM', 'Action 4')
     , ('1/25/18 2:44 PM',  'Action 5');

select * from @t order by action, dt desc

select tt.action, tt.dt, tt.leaddt, DATEDIFF(day, tt.leaddt, tt.dt) as diff 
  from ( select t.* 
              , ROW_NUMBER() over (partition by t.action order by t.dt desc) as rn 
              , lead(t.dt)   over (partition by t.action order by t.dt desc) as leaddt 
           from @T t 
       ) tt
where tt.rn = 1 
  and tt.leaddt is not null 
order by tt.action

dt                      action
----------------------- ----------
2018-01-11 17:24:00.000 Action 1
2018-01-17 16:41:00.000 Action 2
2018-01-11 17:32:00.000 Action 2
2018-01-18 11:12:00.000 Action 3
2018-01-17 16:44:00.000 Action 3
2018-01-17 16:41:00.000 Action 3
2018-01-18 11:13:00.000 Action 4
2018-01-18 11:12:00.000 Action 4
2018-01-25 14:44:00.000 Action 5

action     dt                      leaddt                  diff
---------- ----------------------- ----------------------- -----------
Action 2   2018-01-17 16:41:00.000 2018-01-11 17:32:00.000 6
Action 3   2018-01-18 11:12:00.000 2018-01-17 16:44:00.000 1
Action 4   2018-01-18 11:13:00.000 2018-01-18 11:12:00.000 0

#4


0  

This might be kind of brute force solution, but it should do the job. Logic is 1. Get Max date for each of actions 2. Assign Row Numbers to each of the records so you can iterate over in a subquery. 3.Calculate the difference

这可能是一种暴力解决方案,但它应该做的工作。逻辑为1.获取每个操作的最大日期2.为每个记录分配行号,以便您可以在子查询中进行迭代。 3.计算差异

;WITH cte1 as
(select action, max(record_created) as MaxDt, ROW_Number() OVER(Order by Action) as row_num
 from @YourTable
 group by action
) 

select *, (select DATEDIFF(DAY, b.MaxDT, a.MaxDT) 
            from cte1 b 
           where b.row_num= a.row_num-1 ) as Diff
from cte1 a 

#1


1  

If I understand correctly you want to look at the last date per action and then list the actions sorted by that date and show the time span in days from one action to the next one.

如果我理解正确,您希望查看每个操作的最后日期,然后列出按该日期排序的操作,并显示从一个操作到下一个操作的时间跨度。

So aggregate by action to get the last date and then use LAG to look into the previous record.

因此,通过操作聚合以获取最后一个日期,然后使用LAG查看上一个记录。

select 
  action, 
  max(record_created),
  date_diff(day,
    lag(max(record_created)) over (order by max(record_created)),
    max(record_created)
  ) as diff;
from actions
group by action
order by action;

This query also incudes the first action (with difference = null), but I guess you don't mind.

这个查询也包含了第一个动作(差异= null),但我想你不介意。

Rextester demo: http://rextester.com/EAA26233

Rextester演示:http://rextester.com/EAA26233

#2


3  

You could take the minimal and maximal dates for each action type and use datediff to get the number of days between them:

您可以为每种操作类型选择最小和最大日期,并使用datediff获取它们之间的天数:

SELECT   action, DATEDIFF(DAY, MIN(record_created), MAX(record_created))
FROM     mytable
GROUP BY action
HAVING   COUNT(*) > 1

#3


0  

I don't know where you are getting action 5

我不知道你在哪里采取行动5

declare @T table (dt datetime, action varchar(10));
insert into @T values 
       ('1/11/18 5:24 PM',  'Action 1')
     , ('1/11/18 5:32 PM',  'Action 2')
     , ('1/17/18 4:41 PM',  'Action 3')
     , ('1/17/18 4:41 PM',  'Action 2')
     , ('1/17/18 4:44 PM ', 'Action 3')
     , ('1/18/18 11:12 AM', 'Action 4')
     , ('1/18/18 11:12 AM', 'Action 3')
     , ('1/18/18 11:13 AM', 'Action 4')
     , ('1/25/18 2:44 PM',  'Action 5');

select * from @t order by action, dt desc

select tt.action, tt.dt, tt.leaddt, DATEDIFF(day, tt.leaddt, tt.dt) as diff 
  from ( select t.* 
              , ROW_NUMBER() over (partition by t.action order by t.dt desc) as rn 
              , lead(t.dt)   over (partition by t.action order by t.dt desc) as leaddt 
           from @T t 
       ) tt
where tt.rn = 1 
  and tt.leaddt is not null 
order by tt.action

dt                      action
----------------------- ----------
2018-01-11 17:24:00.000 Action 1
2018-01-17 16:41:00.000 Action 2
2018-01-11 17:32:00.000 Action 2
2018-01-18 11:12:00.000 Action 3
2018-01-17 16:44:00.000 Action 3
2018-01-17 16:41:00.000 Action 3
2018-01-18 11:13:00.000 Action 4
2018-01-18 11:12:00.000 Action 4
2018-01-25 14:44:00.000 Action 5

action     dt                      leaddt                  diff
---------- ----------------------- ----------------------- -----------
Action 2   2018-01-17 16:41:00.000 2018-01-11 17:32:00.000 6
Action 3   2018-01-18 11:12:00.000 2018-01-17 16:44:00.000 1
Action 4   2018-01-18 11:13:00.000 2018-01-18 11:12:00.000 0

#4


0  

This might be kind of brute force solution, but it should do the job. Logic is 1. Get Max date for each of actions 2. Assign Row Numbers to each of the records so you can iterate over in a subquery. 3.Calculate the difference

这可能是一种暴力解决方案,但它应该做的工作。逻辑为1.获取每个操作的最大日期2.为每个记录分配行号,以便您可以在子查询中进行迭代。 3.计算差异

;WITH cte1 as
(select action, max(record_created) as MaxDt, ROW_Number() OVER(Order by Action) as row_num
 from @YourTable
 group by action
) 

select *, (select DATEDIFF(DAY, b.MaxDT, a.MaxDT) 
            from cte1 b 
           where b.row_num= a.row_num-1 ) as Diff
from cte1 a