如何在R中复制行并添加新数据?

时间:2022-07-04 09:13:57

I have a data.table of timestamp data that has both time in and time out for people, rounded to the nearest 15 minute intervals. I want to be able to duplicate each row to have the same number of copies as the number of 15 minute intervals their time-in/time-out data covers, while adding a new column that lists the 15 minute interval (e.g. given a person clocking in at 10:00am and clocking out at 11:00am, there would be four rows made, one with time saying 10:00am, one saying 10:15am, one saying 10:30am, and one saying 10:45am).

我有一个数据。时间戳数据表,时间和时间都为人们所拥有,四舍五入到最近的15分钟间隔。我希望能够复制每一行有相同数量的副本的数量每隔15分钟/暂停数据覆盖,添加一个新列,列出了15分钟的时间间隔(例如,给定一个人在早上10:00时钟和时钟在上午11点,会有四行,有时间说,一个说10:15am,一个说上午10:30,和一个说10:45am)。

3 个解决方案

#1


3  

I have no idea what your data looks like but here is my best shot.

我不知道你的数据是什么样子,但这是我的最佳选择。

library(padr)
library(zoo)

#data
user<-"4"
times<-c("10:00","11:15")
times<-as.POSIXct(times,format="%H:%M")

#create df
dt<-data.frame(user,times)

> dt
  user               times
1    4 2018-05-31 10:00:00
2    4 2018-05-31 11:15:00

#make correct intervals
dt<-pad(dt, interval="15 min")

#carry user id forward
dt<-na.locf(dt)

>dt

  user               times
1    4 2018-05-31 10:00:00
2    4 2018-05-31 10:15:00
3    4 2018-05-31 10:30:00
4    4 2018-05-31 10:45:00
5    4 2018-05-31 11:00:00
6    4 2018-05-31 11:15:00

#2


0  

This is actually just a really simple application of transposition and last observation carried forward.

这实际上只是转置和最后观察的一个很简单的应用。

Here is my employee:

这是我的员工:

emp <- data.frame(empid = 001, timein = as.POSIXct('2018-05-31 8:00'), timeout = as.POSIXct('2018-05-31 17:00'))

Here is my last-observation carried forward wrapper (but there is also zoo::na.locf)

这是我的最后一次观察,带着包装(但也有动物园::na.locf)

locf <- function(y) c(NA, na.omit(y))[cumsum(!is.na(y))+1]

Now transpose:

现在转置:

emplong <- reshape(emp, direction='long', idvar='empid', varying=list(2:3), 
  times=c('in', 'out'), timevar='status')

This gives:

这给:

      empid status              timein
1.in      1     in 2018-05-31 08:00:00
1.out     1    out 2018-05-31 17:00:00

Now create a roster:

现在创建一个名单:

roster <- data.frame('times' = seq(
  from=as.POSIXct('2018-05-31 00:00:00'),
  to=as.POSIXct('2018-06-01 00:00:00'),
  by=15*60))

And merge

和合并

roster <- merge(roster, emplong[, -1], by.x='times', by.x='timein', all=T)

And LOCF

和LOCF

roster$status <- locf(roster$status )
roster$status[is.na(roster$status )] <- 'out'

This gives:

这给:

> roster
                 times status
1  2018-05-31 00:00:00    out
2  2018-05-31 00:15:00    out
3  2018-05-31 00:30:00    out
4  2018-05-31 00:45:00    out
5  2018-05-31 01:00:00    out
...
31 2018-05-31 07:30:00    out
32 2018-05-31 07:45:00    out
33 2018-05-31 08:00:00     in
34 2018-05-31 08:15:00     in
...
67 2018-05-31 16:30:00     in
68 2018-05-31 16:45:00     in
69 2018-05-31 17:00:00    out
70 2018-05-31 17:15:00    out

#3


0  

For a data.table solution, assuming your data.table is formatted like this:

对于一个数据。表解决方案,假设您的数据。表的格式如下:

library(data.table)

dt <- data.table(
  employee = c("John", "Paul", "Mary"),
  clock.in = as.POSIXct(c("10:30", "12:30", "13:15"), format = "%R"),
  clock.out = as.POSIXct(c("11:00", "13:15", "14:15"), format = "%R")
                 )

> dt
   employee            clock.in           clock.out
1:     John 2018-05-31 10:30:00 2018-05-31 11:00:00
2:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00
3:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00

Use setkey to allow for a join between the base table and one where a 15-minute interval sequence is created between the clock in and clock out times:

使用setkey在基表和在时钟输入和时钟输出时间之间创建15分钟间隔序列的表之间建立连接:

setkey(dt, employee)

> dt[dt[, seq.POSIXt(clock.in, clock.out, by = 60*15), by = employee]]
    employee            clock.in           clock.out                  V1
 1:     John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 10:30:00
 2:     John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 10:45:00
 3:     John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 11:00:00
 4:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:15:00
 5:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:30:00
 6:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:45:00
 7:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 14:00:00
 8:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 14:15:00
 9:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 12:30:00
10:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 12:45:00
11:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 13:00:00
12:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 13:15:00

#1


3  

I have no idea what your data looks like but here is my best shot.

我不知道你的数据是什么样子,但这是我的最佳选择。

library(padr)
library(zoo)

#data
user<-"4"
times<-c("10:00","11:15")
times<-as.POSIXct(times,format="%H:%M")

#create df
dt<-data.frame(user,times)

> dt
  user               times
1    4 2018-05-31 10:00:00
2    4 2018-05-31 11:15:00

#make correct intervals
dt<-pad(dt, interval="15 min")

#carry user id forward
dt<-na.locf(dt)

>dt

  user               times
1    4 2018-05-31 10:00:00
2    4 2018-05-31 10:15:00
3    4 2018-05-31 10:30:00
4    4 2018-05-31 10:45:00
5    4 2018-05-31 11:00:00
6    4 2018-05-31 11:15:00

#2


0  

This is actually just a really simple application of transposition and last observation carried forward.

这实际上只是转置和最后观察的一个很简单的应用。

Here is my employee:

这是我的员工:

emp <- data.frame(empid = 001, timein = as.POSIXct('2018-05-31 8:00'), timeout = as.POSIXct('2018-05-31 17:00'))

Here is my last-observation carried forward wrapper (but there is also zoo::na.locf)

这是我的最后一次观察,带着包装(但也有动物园::na.locf)

locf <- function(y) c(NA, na.omit(y))[cumsum(!is.na(y))+1]

Now transpose:

现在转置:

emplong <- reshape(emp, direction='long', idvar='empid', varying=list(2:3), 
  times=c('in', 'out'), timevar='status')

This gives:

这给:

      empid status              timein
1.in      1     in 2018-05-31 08:00:00
1.out     1    out 2018-05-31 17:00:00

Now create a roster:

现在创建一个名单:

roster <- data.frame('times' = seq(
  from=as.POSIXct('2018-05-31 00:00:00'),
  to=as.POSIXct('2018-06-01 00:00:00'),
  by=15*60))

And merge

和合并

roster <- merge(roster, emplong[, -1], by.x='times', by.x='timein', all=T)

And LOCF

和LOCF

roster$status <- locf(roster$status )
roster$status[is.na(roster$status )] <- 'out'

This gives:

这给:

> roster
                 times status
1  2018-05-31 00:00:00    out
2  2018-05-31 00:15:00    out
3  2018-05-31 00:30:00    out
4  2018-05-31 00:45:00    out
5  2018-05-31 01:00:00    out
...
31 2018-05-31 07:30:00    out
32 2018-05-31 07:45:00    out
33 2018-05-31 08:00:00     in
34 2018-05-31 08:15:00     in
...
67 2018-05-31 16:30:00     in
68 2018-05-31 16:45:00     in
69 2018-05-31 17:00:00    out
70 2018-05-31 17:15:00    out

#3


0  

For a data.table solution, assuming your data.table is formatted like this:

对于一个数据。表解决方案,假设您的数据。表的格式如下:

library(data.table)

dt <- data.table(
  employee = c("John", "Paul", "Mary"),
  clock.in = as.POSIXct(c("10:30", "12:30", "13:15"), format = "%R"),
  clock.out = as.POSIXct(c("11:00", "13:15", "14:15"), format = "%R")
                 )

> dt
   employee            clock.in           clock.out
1:     John 2018-05-31 10:30:00 2018-05-31 11:00:00
2:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00
3:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00

Use setkey to allow for a join between the base table and one where a 15-minute interval sequence is created between the clock in and clock out times:

使用setkey在基表和在时钟输入和时钟输出时间之间创建15分钟间隔序列的表之间建立连接:

setkey(dt, employee)

> dt[dt[, seq.POSIXt(clock.in, clock.out, by = 60*15), by = employee]]
    employee            clock.in           clock.out                  V1
 1:     John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 10:30:00
 2:     John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 10:45:00
 3:     John 2018-05-31 10:30:00 2018-05-31 11:00:00 2018-05-31 11:00:00
 4:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:15:00
 5:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:30:00
 6:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 13:45:00
 7:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 14:00:00
 8:     Mary 2018-05-31 13:15:00 2018-05-31 14:15:00 2018-05-31 14:15:00
 9:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 12:30:00
10:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 12:45:00
11:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 13:00:00
12:     Paul 2018-05-31 12:30:00 2018-05-31 13:15:00 2018-05-31 13:15:00