I have a data.table of timestamp data that has both time in and time out for people, rounded to the nearest 15 minute intervals. I want to be able to duplicate each row to have the same number of copies as the number of 15 minute intervals their time-in/time-out data covers, while adding a new column that lists the 15 minute interval (e.g. given a person clocking in at 10:00am and clocking out at 11:00am, there would be four rows made, one with time saying 10:00am, one saying 10:15am, one saying 10:30am, and one saying 10:45am).
我有一个数据。时间戳数据表,时间和时间都为人们所拥有,四舍五入到最近的15分钟间隔。我希望能够复制每一行有相同数量的副本的数量每隔15分钟/暂停数据覆盖,添加一个新列,列出了15分钟的时间间隔(例如,给定一个人在早上10:00时钟和时钟在上午11点,会有四行,有时间说,一个说10:15am,一个说上午10:30,和一个说10:45am)。
3 个解决方案
#1
3
I have no idea what your data looks like but here is my best shot.
我不知道你的数据是什么样子,但这是我的最佳选择。
library(padr)
library(zoo)
#data
user<-"4"
times<-c("10:00","11:15")
times<-as.POSIXct(times,format="%H:%M")
#create df
dt<-data.frame(user,times)
> dt
user times
1 4 2018-05-31 10:00:00
2 4 2018-05-31 11:15:00
#make correct intervals
dt<-pad(dt, interval="15 min")
#carry user id forward
dt<-na.locf(dt)
>dt
user times
1 4 2018-05-31 10:00:00
2 4 2018-05-31 10:15:00
3 4 2018-05-31 10:30:00
4 4 2018-05-31 10:45:00
5 4 2018-05-31 11:00:00
6 4 2018-05-31 11:15:00
#2
0
This is actually just a really simple application of transposition and last observation carried forward.
这实际上只是转置和最后观察的一个很简单的应用。
Here is my employee:
这是我的员工:
emp <- data.frame(empid = 001, timein = as.POSIXct('2018-05-31 8:00'), timeout = as.POSIXct('2018-05-31 17:00'))
Here is my last-observation carried forward wrapper (but there is also zoo::na.locf
)
这是我的最后一次观察,带着包装(但也有动物园::na.locf)
locf <- function(y) c(NA, na.omit(y))[cumsum(!is.na(y))+1]
Now transpose:
现在转置:
emplong <- reshape(emp, direction='long', idvar='empid', varying=list(2:3),
times=c('in', 'out'), timevar='status')
This gives:
这给:
empid status timein
1.in 1 in 2018-05-31 08:00:00
1.out 1 out 2018-05-31 17:00:00
Now create a roster:
现在创建一个名单:
roster <- data.frame('times' = seq(
from=as.POSIXct('2018-05-31 00:00:00'),
to=as.POSIXct('2018-06-01 00:00:00'),
by=15*60))
And merge
和合并
roster <- merge(roster, emplong[, -1], by.x='times', by.x='timein', all=T)
And LOCF
和LOCF
roster$status <- locf(roster$status )
roster$status[is.na(roster$status )] <- 'out'
This gives:
这给:
> roster
times status
1 2018-05-31 00:00:00 out
2 2018-05-31 00:15:00 out
3 2018-05-31 00:30:00 out
4 2018-05-31 00:45:00 out
5 2018-05-31 01:00:00 out
...
31 2018-05-31 07:30:00 out
32 2018-05-31 07:45:00 out
33 2018-05-31 08:00:00 in
34 2018-05-31 08:15:00 in
...
67 2018-05-31 16:30:00 in
68 2018-05-31 16:45:00 in
69 2018-05-31 17:00:00 out
70 2018-05-31 17:15:00 out
#3
0
For a data.table
solution, assuming your data.table
is formatted like this:
对于一个数据。表解决方案,假设您的数据。表的格式如下:
library(data.table)
dt <- data.table(
employee = c("John", "Paul", "Mary"),
clock.in = as.POSIXct(c("10:30", "12:30", "13:15"), format = "%R"),
clock.out = as.POSIXct(c("11:00", "13:15", "14:15"), format = "%R")
)
> dt
employee clock.in clock.out
1: John 2018-05-31 10:30:00 2018-05-31 11:00:00
2: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00
3: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00
Use setkey
to allow for a join between the base table and one where a 15-minute interval sequence is created between the clock in and clock out times:
使用setkey在基表和在时钟输入和时钟输出时间之间创建15分钟间隔序列的表之间建立连接:
setkey(dt, employee)
> dt[dt[, seq.POSIXt(clock.in, clock.out, by = 60*15), by = employee]]
employee clock.in clock.out V1
1: John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 10:30:00
2: John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 10:45:00
3: John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 11:00:00
4: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:15:00
5: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:30:00
6: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:45:00
7: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 14:00:00
8: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 14:15:00
9: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 12:30:00
10: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 12:45:00
11: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 13:00:00
12: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 13:15:00
#1
3
I have no idea what your data looks like but here is my best shot.
我不知道你的数据是什么样子,但这是我的最佳选择。
library(padr)
library(zoo)
#data
user<-"4"
times<-c("10:00","11:15")
times<-as.POSIXct(times,format="%H:%M")
#create df
dt<-data.frame(user,times)
> dt
user times
1 4 2018-05-31 10:00:00
2 4 2018-05-31 11:15:00
#make correct intervals
dt<-pad(dt, interval="15 min")
#carry user id forward
dt<-na.locf(dt)
>dt
user times
1 4 2018-05-31 10:00:00
2 4 2018-05-31 10:15:00
3 4 2018-05-31 10:30:00
4 4 2018-05-31 10:45:00
5 4 2018-05-31 11:00:00
6 4 2018-05-31 11:15:00
#2
0
This is actually just a really simple application of transposition and last observation carried forward.
这实际上只是转置和最后观察的一个很简单的应用。
Here is my employee:
这是我的员工:
emp <- data.frame(empid = 001, timein = as.POSIXct('2018-05-31 8:00'), timeout = as.POSIXct('2018-05-31 17:00'))
Here is my last-observation carried forward wrapper (but there is also zoo::na.locf
)
这是我的最后一次观察,带着包装(但也有动物园::na.locf)
locf <- function(y) c(NA, na.omit(y))[cumsum(!is.na(y))+1]
Now transpose:
现在转置:
emplong <- reshape(emp, direction='long', idvar='empid', varying=list(2:3),
times=c('in', 'out'), timevar='status')
This gives:
这给:
empid status timein
1.in 1 in 2018-05-31 08:00:00
1.out 1 out 2018-05-31 17:00:00
Now create a roster:
现在创建一个名单:
roster <- data.frame('times' = seq(
from=as.POSIXct('2018-05-31 00:00:00'),
to=as.POSIXct('2018-06-01 00:00:00'),
by=15*60))
And merge
和合并
roster <- merge(roster, emplong[, -1], by.x='times', by.x='timein', all=T)
And LOCF
和LOCF
roster$status <- locf(roster$status )
roster$status[is.na(roster$status )] <- 'out'
This gives:
这给:
> roster
times status
1 2018-05-31 00:00:00 out
2 2018-05-31 00:15:00 out
3 2018-05-31 00:30:00 out
4 2018-05-31 00:45:00 out
5 2018-05-31 01:00:00 out
...
31 2018-05-31 07:30:00 out
32 2018-05-31 07:45:00 out
33 2018-05-31 08:00:00 in
34 2018-05-31 08:15:00 in
...
67 2018-05-31 16:30:00 in
68 2018-05-31 16:45:00 in
69 2018-05-31 17:00:00 out
70 2018-05-31 17:15:00 out
#3
0
For a data.table
solution, assuming your data.table
is formatted like this:
对于一个数据。表解决方案,假设您的数据。表的格式如下:
library(data.table)
dt <- data.table(
employee = c("John", "Paul", "Mary"),
clock.in = as.POSIXct(c("10:30", "12:30", "13:15"), format = "%R"),
clock.out = as.POSIXct(c("11:00", "13:15", "14:15"), format = "%R")
)
> dt
employee clock.in clock.out
1: John 2018-05-31 10:30:00 2018-05-31 11:00:00
2: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00
3: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00
Use setkey
to allow for a join between the base table and one where a 15-minute interval sequence is created between the clock in and clock out times:
使用setkey在基表和在时钟输入和时钟输出时间之间创建15分钟间隔序列的表之间建立连接:
setkey(dt, employee)
> dt[dt[, seq.POSIXt(clock.in, clock.out, by = 60*15), by = employee]]
employee clock.in clock.out V1
1: John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 10:30:00
2: John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 10:45:00
3: John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 11:00:00
4: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:15:00
5: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:30:00
6: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:45:00
7: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 14:00:00
8: Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 14:15:00
9: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 12:30:00
10: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 12:45:00
11: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 13:00:00
12: Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 13:15:00