I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.
我有一个存储一些时间戳的mysql数据库。我们假设表中的所有内容都是ID和时间戳。时间戳可能重复。
I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?
我想找到不重复的连续行之间的平均时差(按时间)。有没有办法在SQL中做到这一点?
3 个解决方案
#1
29
If your table is t, and your timestamp column is ts, and you want the answer in seconds:
如果您的表是t,并且您的时间戳列是ts,并且您想要在几秒钟内得到答案:
SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) )
/
(COUNT(DISTINCT(ts)) -1)
FROM t
This will be miles quicker for large tables as it has no n-squared JOIN
对于大型桌子来说,这将更快,因为它没有n平方JOIN
This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.
这使用了一个可爱的数学技巧,有助于解决这个问题。暂时忽略重复的问题。连续行之间的平均时间差是第一个时间戳和最后一个时间戳之间的差值除以行数-1。
Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.
证明:连续行之间的平均距离是连续行之间的距离之和除以连续行数。但是连续行之间的差异总和只是第一行和最后一行之间的距离(假设它们按时间戳排序)。并且连续行数是总行数-1。
Then we just condition the timestamps to be distinct.
然后我们只是将时间戳区分开来。
#2
2
Are the ID's contiguous ?
身份证是否连续?
You could do something like,
你可以做点什么,
SELECT
a.ID
, b.ID
, a.Timestamp
, b.Timestamp
, b.timestamp - a.timestamp as Difference
FROM
MyTable a
JOIN MyTable b
ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp
That'll give you a list of time differences on each consecutive row pair...
那会给你一个连续行对的时间差列表......
Then you could wrap that up in an AVG grouping...
然后你可以把它包装成AVG分组......
#3
1
Here's one way:
这是一种方式:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on cur.id = prev.id + 1
and cur.datecol <> prev.datecol
The timestampdiff function allows you to choose between days, months, seconds, and so on.
timestampdiff函数允许您在天,月,秒等之间进行选择。
If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:
如果id不是连续的,您可以通过添加其中没有其他行的规则来选择上一行:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on prev.datecol < cur.datecol
and not exists (
select *
from table inbetween
where prev.datecol < inbetween.datecol
and inbetween.datecol < cur.datecol)
)
#1
29
If your table is t, and your timestamp column is ts, and you want the answer in seconds:
如果您的表是t,并且您的时间戳列是ts,并且您想要在几秒钟内得到答案:
SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) )
/
(COUNT(DISTINCT(ts)) -1)
FROM t
This will be miles quicker for large tables as it has no n-squared JOIN
对于大型桌子来说,这将更快,因为它没有n平方JOIN
This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.
这使用了一个可爱的数学技巧,有助于解决这个问题。暂时忽略重复的问题。连续行之间的平均时间差是第一个时间戳和最后一个时间戳之间的差值除以行数-1。
Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.
证明:连续行之间的平均距离是连续行之间的距离之和除以连续行数。但是连续行之间的差异总和只是第一行和最后一行之间的距离(假设它们按时间戳排序)。并且连续行数是总行数-1。
Then we just condition the timestamps to be distinct.
然后我们只是将时间戳区分开来。
#2
2
Are the ID's contiguous ?
身份证是否连续?
You could do something like,
你可以做点什么,
SELECT
a.ID
, b.ID
, a.Timestamp
, b.Timestamp
, b.timestamp - a.timestamp as Difference
FROM
MyTable a
JOIN MyTable b
ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp
That'll give you a list of time differences on each consecutive row pair...
那会给你一个连续行对的时间差列表......
Then you could wrap that up in an AVG grouping...
然后你可以把它包装成AVG分组......
#3
1
Here's one way:
这是一种方式:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on cur.id = prev.id + 1
and cur.datecol <> prev.datecol
The timestampdiff function allows you to choose between days, months, seconds, and so on.
timestampdiff函数允许您在天,月,秒等之间进行选择。
If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:
如果id不是连续的,您可以通过添加其中没有其他行的规则来选择上一行:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on prev.datecol < cur.datecol
and not exists (
select *
from table inbetween
where prev.datecol < inbetween.datecol
and inbetween.datecol < cur.datecol)
)