提取数据帧的子集，其中记录按特定时间段分隔

(I have modified this question to make it more explicit.)

（我已修改此问题以使其更明确。）

I have a dataset as follows:

我有一个数据集如下：

data <- structure(list(id = 1:12, personID = c(1L, 2L, 3L, 4L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 4L), lastName = structure(c(1L, 2L, 3L, 4L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 4L), .Label = c("james", "joan", "lucy", "mary"), class = "factor"), date = structure(c(5L, 5L, 8L, 9L, 6L, 1L, 3L, 11L, 4L, 2L, 7L, 10L), .Label = c("1/01/2012", "10/04/2011", "11/01/2012", "11/08/2011", "12/01/2012", "12/04/2012", "12/12/2011", "14/01/2012", "16/01/2012", "24/06/2010", "24/06/2011" ), class = "factor"), status = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L)), .Names = c("id", "personID", "lastName", "date", "status"), class = "data.frame", row.names = c(NA, -12L ))

data < - structure（list（id = 1：12，personID = c（1L，2L，3L，4L，4L，3L，2L，1L，1L，2L，3L，4L），lastName = structure（c（1L， 2L，3L，4L，4L，3L，2L，1L，1L，2L，3L，4L）,. Label = c（“james”，“joan”，“lucy”，“mary”），class =“factor” ），日期=结构（c（5L，5L，8L，9L，6L，1L，3L，11L，4L，2L，7L，10L）,. Label = c（“1/01/2012”，“10/04” / 2011“，”11/01/2012“，”2011年11月11日“，”12/01/2012“，”12/04/2012“，”12/12/2011“，”14/01/2012 “，”16/01/2012“，”24/06/2010“，”24/06/2011“），class =”factor“），status = c（1L，1L，1L，1L，1L，1L， 1L，1L，2L，1L，2L，1L））,. Name = c（“id”，“personID”，“lastName”，“date”，“status”），class =“data.frame”，row。 names = c（NA，-12L））

I need to extract a subset from the data frame to include records where each row occured more than once in a period of greater than 8 weeks.

我需要从数据框中提取一个子集，以包含记录，其中每行在超过8周的时间内出现不止一次。

The extraction needs to search from the oldest record and then select the next (more recent) additional record for the same personID that was greater then 8 weeks since the previous record. Upon finding another record older then 8 weeks it should repeat the process using what the more recent second record as the new starting point.

提取需要从最旧的记录中搜索，然后为自上一记录以来大于8周的同一personID选择下一个（更近期的）附加记录。在找到另一个早于8周的记录时，它应该使用最近的第二条记录作为新起点重复该过程。

Thanks.

谢谢。

2 个解决方案

#1

How about:

怎么样：

maxDiff <- tapply(data$date,data$personID,function(x) max(dist(x)))
subset(data,personID %in% names(maxDiff[maxDiff>(8*7)]))
  id personID lastName       date status
1  1        1    james 2012-01-12      1
4  4        4     mary 2012-01-16      1
5  5        4     mary 2012-04-12      1
8  8        1    james 2011-06-24      1

#2

This will do the trick, though I'm sure someone else can give you a better answer.

虽然我确信其他人可以给你一个更好的答案，但这样做会有所帮助。

require(plyr)

diffWeek <- function (df) { 
  abs(df$date[1] - df$date[2])}

eightWeeks <- 7*8 # 56 days
aux.data <- ddply(data, "lastName", function (df) diffWeek(df) >   eightWeeks)

data[data$lastName %in% aux.data[aux.data[,2]==T,1],] # this willreturn the data.frame.

Note that my answer doesn't generalize well. If I have more time I'll try to generalize it. But it should work for now.

请注意，我的答案并不能很好地概括。如果我有更多的时间，我会尝试概括它。但它现在应该有效。

#1