I have a transactional database with sales data and user id like the following:
我有一个具有销售数据和用户id的事务性数据库:
id_usuarioweb dt_fechaventa
1551415 2015-08-01 14:57:21.737
1551415 2015-08-06 15:34:21.920
6958538 2015-07-30 09:26:24.427
6958538 2015-08-05 09:30:06.247
6958538 2015-08-31 17:39:02.027
39101175 2015-08-05 16:34:17.990
39101175 2015-09-20 20:37:26.043
1551415 2015-09-05 13:41:43.767
3673384 2015-09-06 13:34:23.440
And I would like to calculate the average diference between dates by the same customer in the data base (to find average frequency with which the user buys).
我还想计算数据库中相同客户在不同日期之间的平均差异(以找到用户购买的平均频率)。
I'm aware I can do datediff with two columns, but i'm have issues trying to do it in the same field and "grouping" by user id.
我知道我可以用两个列来做datediff,但是我在尝试在同一个字段中做它并按用户id“分组”时遇到了一些问题。
The desired outcome would be like this:
期望的结果如下:
id_usuarioweb avgtime_days
1551415 5
6958538 25
39101175 25
1551415 0
3673384 0
How can I achieve this? I would have the database ordered by user_id and then dt_fechaventa (the sale time).
我如何做到这一点?我将使用user_id和dt_fechaventa(销售时间)对数据库进行排序。
USING: SQL Server 2008
使用:SQL Server 2008
3 个解决方案
#1
4
I think what you are looking for is calculated like this. Take the maximum and minimum dates, get the difference between them and divide by the number of purchases.
我想你要找的是这样计算的。取最大值和最小值,取它们之间的差,除以购买次数。
SELECT id_usuarioweb, CASE
WHEN COUNT(*) < 2
THEN 0
ELSE DATEDIFF(dd,
MIN(
dt_fechaventa
), MAX(
dt_fechaventa
)) / (
COUNT(*) -
1
)
END AS avgtime_days
FROM mytable
GROUP BY id_usuarioweb
EDIT: (by @GordonLinoff)
编辑:(@GordonLinoff)
The reason that this is correct is easily seen if you look at the math. Consider three dates, a, b, and c.
这是正确的原因很容易理解如果你看一下数学。考虑三个日期,a、b和c。
The average time between them is:
他们之间的平均时间是:
((b - a) + (c - b)) / 2
This simplifies to:
这样可以简化为:
(c - a) / 2
In other words, the intermediate value cancels out. And, this continues regardless of the number of intermediate values.
换句话说,中间值抵消了。不管中间值的个数是多少,它都是连续的。
#2
2
This should do:
这个应该做的是:
;WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY id_usuarioweb ORDER BY dt_fechaventa),
N = COUNT(*) OVER(PARTITION BY id_usuarioweb)
FROM dbo.YourTable
)
SELECT A.id_usuarioweb,
AVG(DATEDIFF(DAY,A.dt_fechaventa,B.dt_fechaventa)) avgtime_days
FROM CTE A
INNER JOIN CTE B
ON A.id_usuarioweb = B.id_usuarioweb
AND A.RN = B.RN - 1
WHERE A.N > 1
GROUP BY A.id_usuarioweb;
I'm filtering the users that only have one row there, because you can't calculate an average of days with them.
我在过滤那些只有一行的用户,因为你不能计算他们的平均天数。
Here is a demo in sqlfiddle of this. And the results are:
这是在sqlfiddle的一个演示。结果是:
╔═══════════════╦══════════════╗
║ id_usuarioweb ║ avgtime_days ║
╠═══════════════╬══════════════╣
║ 1551415 ║ 17 ║
║ 6958538 ║ 16 ║
║ 39101175 ║ 46 ║
╚═══════════════╩══════════════╝
#3
1
You can first number rows using row_number
and then do a self-join
with the cte
. Then perform the average
. However, you would get one row per user, but not as per the expected result.
您可以首先使用row_number对行进行编号,然后与cte进行自连接。然后执行平均水平。但是,您将得到每个用户一行,但不是按照预期的结果。
with x as
(select id_usuarioweb, dt_fechaventa,
row_number() over(partition by id_usuarioweb order by dt_fechaventa) as rn
from tablename)
select x1.id_usuarioweb, avg(datediff(dd,x1.dt_fechaventa,x2.dt_fechaventa)) as avgdiff
from x x1 join x x2
on x1.id_usuarioweb = x2.id_usuarioweb and x1.rn = x2.rn-1
group by x1.id_usuarioweb
#1
4
I think what you are looking for is calculated like this. Take the maximum and minimum dates, get the difference between them and divide by the number of purchases.
我想你要找的是这样计算的。取最大值和最小值,取它们之间的差,除以购买次数。
SELECT id_usuarioweb, CASE
WHEN COUNT(*) < 2
THEN 0
ELSE DATEDIFF(dd,
MIN(
dt_fechaventa
), MAX(
dt_fechaventa
)) / (
COUNT(*) -
1
)
END AS avgtime_days
FROM mytable
GROUP BY id_usuarioweb
EDIT: (by @GordonLinoff)
编辑:(@GordonLinoff)
The reason that this is correct is easily seen if you look at the math. Consider three dates, a, b, and c.
这是正确的原因很容易理解如果你看一下数学。考虑三个日期,a、b和c。
The average time between them is:
他们之间的平均时间是:
((b - a) + (c - b)) / 2
This simplifies to:
这样可以简化为:
(c - a) / 2
In other words, the intermediate value cancels out. And, this continues regardless of the number of intermediate values.
换句话说,中间值抵消了。不管中间值的个数是多少,它都是连续的。
#2
2
This should do:
这个应该做的是:
;WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY id_usuarioweb ORDER BY dt_fechaventa),
N = COUNT(*) OVER(PARTITION BY id_usuarioweb)
FROM dbo.YourTable
)
SELECT A.id_usuarioweb,
AVG(DATEDIFF(DAY,A.dt_fechaventa,B.dt_fechaventa)) avgtime_days
FROM CTE A
INNER JOIN CTE B
ON A.id_usuarioweb = B.id_usuarioweb
AND A.RN = B.RN - 1
WHERE A.N > 1
GROUP BY A.id_usuarioweb;
I'm filtering the users that only have one row there, because you can't calculate an average of days with them.
我在过滤那些只有一行的用户,因为你不能计算他们的平均天数。
Here is a demo in sqlfiddle of this. And the results are:
这是在sqlfiddle的一个演示。结果是:
╔═══════════════╦══════════════╗
║ id_usuarioweb ║ avgtime_days ║
╠═══════════════╬══════════════╣
║ 1551415 ║ 17 ║
║ 6958538 ║ 16 ║
║ 39101175 ║ 46 ║
╚═══════════════╩══════════════╝
#3
1
You can first number rows using row_number
and then do a self-join
with the cte
. Then perform the average
. However, you would get one row per user, but not as per the expected result.
您可以首先使用row_number对行进行编号,然后与cte进行自连接。然后执行平均水平。但是,您将得到每个用户一行,但不是按照预期的结果。
with x as
(select id_usuarioweb, dt_fechaventa,
row_number() over(partition by id_usuarioweb order by dt_fechaventa) as rn
from tablename)
select x1.id_usuarioweb, avg(datediff(dd,x1.dt_fechaventa,x2.dt_fechaventa)) as avgdiff
from x x1 join x x2
on x1.id_usuarioweb = x2.id_usuarioweb and x1.rn = x2.rn-1
group by x1.id_usuarioweb