查找由相同行数分隔的同一列中的日期时间之间的差异

时间:2022-01-17 21:33:47

I'm currently examining a number of subjects (id) and how many times they are recorded visiting a specific location (location) over a course of time. Rather than visually-identifying when each subject has arrived at a location and recording the date-time (datetime), we are utilizing simple motion detection as to increase our coverage. Unfortunately, some of these technologies can record "false detections" which would make it appear as if a subject was present when it truly was not.

我目前正在研究一些主题(id)以及他们在一段时间内访问特定位置(位置)的次数。我们正在利用简单的运动检测来增加我们的覆盖范围,而不是在视觉上识别每个主体何时到达某个位置并记录日期时间(日期时间)。不幸的是,这些技术中的一些可以记录“错误检测”,这使得它看起来好像一个主体存在,而实际上并非如此。

In order to confidently assume that a subject did visit the location, the manufacturer recommends that there must be at least 3 recordings per 30 minutes. The df data.table/data.frame below is an example:

为了自信地假设受试者确实访问了该位置,制造商建议每30分钟必须至少有3次记录。下面的df data.table / data.frame就是一个例子:

> df <- data.table(df, key = c("id", "location", "datetime"))
> df
    id            datetime location
 1:  1 2014-06-01 08:03:00        a
 2:  1 2014-06-01 08:56:00        a
 3:  1 2014-06-01 08:58:00        a
 4:  1 2014-06-01 09:09:00        a
 5:  1 2014-06-01 09:20:00        a
 6:  1 2014-06-01 08:28:00        b
 7:  1 2014-06-01 08:33:00        b
 8:  1 2014-06-01 08:38:00        b
 9:  1 2014-06-01 08:42:00        b
10:  1 2014-06-01 09:31:00        b
11:  1 2014-06-01 08:18:00        c
12:  1 2014-06-01 08:50:00        c
13:  1 2014-06-01 08:52:00        c
14:  1 2014-06-01 08:53:00        c
15:  1 2014-06-01 09:05:00        c
16:  2 2014-06-01 09:35:00        a
17:  2 2014-06-01 09:45:00        a
18:  2 2014-06-01 10:40:00        a
19:  2 2014-06-01 10:44:00        a
20:  2 2014-06-01 10:59:00        a
21:  2 2014-06-01 11:04:00        a
22:  2 2014-06-01 09:54:00        b
23:  2 2014-06-01 10:12:00        b
24:  2 2014-06-01 09:40:00        c
25:  2 2014-06-01 10:01:00        c
26:  2 2014-06-01 10:07:00        c
27:  2 2014-06-01 10:19:00        c
28:  2 2014-06-01 10:32:00        c
29:  2 2014-06-01 10:49:00        c
30:  2 2014-06-01 10:57:00        c

The key used above organizes the data by subject (id), their visited locations (location) and the time they visited the locations (datetime). By organizing the data.table in this way, all that needs to be done is determine if and when the time between 3 consecutive recordings exceeds 30 minutes. My desired output would be the following:

上面使用的密钥按主题(id),访问位置(位置)和访问位置的时间(日期时间)组织数据。通过以这种方式组织data.table,所有需要做的就是确定3次连续录制之间的时间是否超过30分钟。我想要的输出如下:

> df
    id            datetime location diff_min
 1:  1 2014-06-01 08:03:00        a       55
 2:  1 2014-06-01 08:56:00        a       13
 3:  1 2014-06-01 08:58:00        a       22
 4:  1 2014-06-01 09:09:00        a       NA  <-----
 5:  1 2014-06-01 09:20:00        a       NA  <-----
 6:  1 2014-06-01 08:28:00        b       10
 7:  1 2014-06-01 08:33:00        b        9
 8:  1 2014-06-01 08:38:00        b       53
 9:  1 2014-06-01 08:42:00        b       NA  <-----
10:  1 2014-06-01 09:31:00        b       NA  <-----
11:  1 2014-06-01 08:18:00        c       34
12:  1 2014-06-01 08:50:00        c        3
13:  1 2014-06-01 08:52:00        c       13
14:  1 2014-06-01 08:53:00        c       NA  <-----
15:  1 2014-06-01 09:05:00        c       NA  <-----
16:  2 2014-06-01 09:35:00        a       65
17:  2 2014-06-01 09:45:00        a       59
18:  2 2014-06-01 10:40:00        a       19
19:  2 2014-06-01 10:44:00        a       20
20:  2 2014-06-01 10:59:00        a       NA  <-----
21:  2 2014-06-01 11:04:00        a       NA  <-----
22:  2 2014-06-01 09:54:00        b       NA  <-----
23:  2 2014-06-01 10:12:00        b       NA  <-----
24:  2 2014-06-01 09:40:00        c       27
25:  2 2014-06-01 10:01:00        c       18
26:  2 2014-06-01 10:07:00        c       25
27:  2 2014-06-01 10:19:00        c       30
28:  2 2014-06-01 10:32:00        c       25
29:  2 2014-06-01 10:49:00        c       NA  <-----
30:  2 2014-06-01 10:57:00        c       NA  <-----

Please take notice of the <----- that point out NA values. Since I am finding the difftime() two rows down from the initial value (3 recordings total), the last two rows/recordings per id and location would be NA because there are less than 3 recordings left. Any location with 2 or less recordings would automatically get NA values.

请注意指出NA值的<-----。由于我发现difftime()从初始值向下两行(总共3个记录),每个id和位置的最后两行/记录将是NA,因为剩下少于3个记录。具有2个或更少记录的任何位置将自动获得NA值。

I attempted to solve this on my own with the following code but I am no closer to solving it:

我尝试使用以下代码自行解决这个问题,但我没有接近解决它:

> df[, diff_min := lapply(.SD, function(x) c(difftime(x[3:length(x)], x[1:(length(x)-2)], units = "mins"), NA, NA)), 
+    .SDcols = "datetime", by = c("id", "location")]
Warning message:
In `[.data.table`(df, , `:=`(diff_min, lapply(.SD, function(x) c(difftime(x[3:length(x)],  :
  RHS 1 is length 4 (greater than the size (2) of group 5). The last 2 element(s) will be discarded.

Please see the following dput() output below if you care to try it out:

如果您想尝试一下,请参阅下面的dput()输出:

> dput(df)
structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1401624180L, 
1401627360L, 1401627480L, 1401628140L, 1401628800L, 1401625680L, 
1401625980L, 1401626280L, 1401626520L, 1401629460L, 1401625080L, 
1401627000L, 1401627120L, 1401627180L, 1401627900L, 1401629700L, 
1401630300L, 1401633600L, 1401633840L, 1401634740L, 1401635040L, 
1401630840L, 1401631920L, 1401630000L, 1401631260L, 1401631620L, 
1401632340L, 1401633120L, 1401634140L, 1401634620L), class = c("POSIXct", 
"POSIXt"), tzone = ""), location = structure(c(1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", 
"c"), class = "factor")), .Names = c("id", "datetime", "location"
), row.names = c(NA, -30L), class = c("data.table", "data.frame"
), sorted = c("id", "location", "datetime"), .internal.selfref = <pointer: 0x0000000000100788>)

Please feel free to ask questions and use any coding package to achieve the desired output (e.g. base, dplyr). Thanks for your time!

请随意提出问题并使用任何编码包来实现所需的输出(例如base,dplyr)。谢谢你的时间!

1 个解决方案

#1


1  

Use rollapply from zoo:

从动物园使用rollapply:

library(zoo)

Diff <- function(x) difftime(x[3], x[1], units = "min")
df[, diff_min := rollapply(datetime, 3, Diff, align = "left", fill = NA), 
       by = list(id, location)]

#1


1  

Use rollapply from zoo:

从动物园使用rollapply:

library(zoo)

Diff <- function(x) difftime(x[3], x[1], units = "min")
df[, diff_min := rollapply(datetime, 3, Diff, align = "left", fill = NA), 
       by = list(id, location)]