I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
我有一张包含超过500万行销售交易的表格。我想找到每个客户最近三次购买之间的日期间隔总和。
Suppose my table looks like this :
假设我的表看起来像这样:
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
我正在寻找的结果可能如下所示:
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate
) and then calculate the interval between startDate
and ExpiryDate
of his/her transactions.
我知道查询需要首先检索每个客户的3个重新发送的事务(基于ServiceStartDate),然后计算他/她的事务的startDate和ExpiryDate之间的间隔。
3 个解决方案
#1
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
您希望根据降序日期计算上一行的ServiceExpiryDate与当前行的ServiceStartDate之间的差异,然后总结最后两个差异:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
使用LEAD的结果相同:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases
(or max(rn)
) from Intervaldays
. But as you only sum two differences this seems to be not correct for me either...
除非您从Intervaldays中减去购买量(或max(rn)),否则两者都不会返回预期结果。但是,由于你只是总结了两个差异,这似乎对我来说不正确......
Additional logic must be applied based on your rules regarding:
必须根据您的规则应用其他逻辑:
- customer has less than 3 purchases
- overlapping intervals
客户购买少于3次
#2
Assuming there are no overlaps, I think you want this:
假设没有重叠,我想你想要这个:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
#3
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
结果将是:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138
#1
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
您希望根据降序日期计算上一行的ServiceExpiryDate与当前行的ServiceStartDate之间的差异,然后总结最后两个差异:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
使用LEAD的结果相同:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases
(or max(rn)
) from Intervaldays
. But as you only sum two differences this seems to be not correct for me either...
除非您从Intervaldays中减去购买量(或max(rn)),否则两者都不会返回预期结果。但是,由于你只是总结了两个差异,这似乎对我来说不正确......
Additional logic must be applied based on your rules regarding:
必须根据您的规则应用其他逻辑:
- customer has less than 3 purchases
- overlapping intervals
客户购买少于3次
#2
Assuming there are no overlaps, I think you want this:
假设没有重叠,我想你想要这个:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
#3
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
结果将是:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138