(I have modified this question to make it more explicit.)
(我已修改此问题以使其更明确。)
I have a dataset as follows:
我有一个数据集如下:
data <- structure(list(id = 1:12, personID = c(1L, 2L, 3L, 4L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 4L), lastName = structure(c(1L, 2L, 3L, 4L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 4L), .Label = c("james", "joan", "lucy", "mary"), class = "factor"), date = structure(c(5L, 5L, 8L, 9L, 6L, 1L, 3L, 11L, 4L, 2L, 7L, 10L), .Label = c("1/01/2012", "10/04/2011", "11/01/2012", "11/08/2011", "12/01/2012", "12/04/2012", "12/12/2011", "14/01/2012", "16/01/2012", "24/06/2010", "24/06/2011" ), class = "factor"), status = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L)), .Names = c("id", "personID", "lastName", "date", "status"), class = "data.frame", row.names = c(NA, -12L ))
data < - structure(list(id = 1:12,personID = c(1L,2L,3L,4L,4L,3L,2L,1L,1L,2L,3L,4L),lastName = structure(c(1L, 2L,3L,4L,4L,3L,2L,1L,1L,2L,3L,4L),. Label = c(“james”,“joan”,“lucy”,“mary”),class =“factor” ),日期=结构(c(5L,5L,8L,9L,6L,1L,3L,11L,4L,2L,7L,10L),. Label = c(“1/01/2012”,“10/04” / 2011“,”11/01/2012“,”2011年11月11日“,”12/01/2012“,”12/04/2012“,”12/12/2011“,”14/01/2012 “,”16/01/2012“,”24/06/2010“,”24/06/2011“),class =”factor“),status = c(1L,1L,1L,1L,1L,1L, 1L,1L,2L,1L,2L,1L)),. Name = c(“id”,“personID”,“lastName”,“date”,“status”),class =“data.frame”,row。 names = c(NA,-12L))
I need to extract a subset from the data frame to include records where each row occured more than once in a period of greater than 8 weeks.
我需要从数据框中提取一个子集,以包含记录,其中每行在超过8周的时间内出现不止一次。
The extraction needs to search from the oldest record and then select the next (more recent) additional record for the same personID that was greater then 8 weeks since the previous record. Upon finding another record older then 8 weeks it should repeat the process using what the more recent second record as the new starting point.
提取需要从最旧的记录中搜索,然后为自上一记录以来大于8周的同一personID选择下一个(更近期的)附加记录。在找到另一个早于8周的记录时,它应该使用最近的第二条记录作为新起点重复该过程。
Thanks.
谢谢。
2 个解决方案
#1
1
How about:
怎么样:
maxDiff <- tapply(data$date,data$personID,function(x) max(dist(x)))
subset(data,personID %in% names(maxDiff[maxDiff>(8*7)]))
id personID lastName date status
1 1 1 james 2012-01-12 1
4 4 4 mary 2012-01-16 1
5 5 4 mary 2012-04-12 1
8 8 1 james 2011-06-24 1
#2
0
This will do the trick, though I'm sure someone else can give you a better answer.
虽然我确信其他人可以给你一个更好的答案,但这样做会有所帮助。
require(plyr)
diffWeek <- function (df) {
abs(df$date[1] - df$date[2])}
eightWeeks <- 7*8 # 56 days
aux.data <- ddply(data, "lastName", function (df) diffWeek(df) > eightWeeks)
data[data$lastName %in% aux.data[aux.data[,2]==T,1],] # this willreturn the data.frame.
Note that my answer doesn't generalize well. If I have more time I'll try to generalize it. But it should work for now.
请注意,我的答案并不能很好地概括。如果我有更多的时间,我会尝试概括它。但它现在应该有效。
#1
1
How about:
怎么样:
maxDiff <- tapply(data$date,data$personID,function(x) max(dist(x)))
subset(data,personID %in% names(maxDiff[maxDiff>(8*7)]))
id personID lastName date status
1 1 1 james 2012-01-12 1
4 4 4 mary 2012-01-16 1
5 5 4 mary 2012-04-12 1
8 8 1 james 2011-06-24 1
#2
0
This will do the trick, though I'm sure someone else can give you a better answer.
虽然我确信其他人可以给你一个更好的答案,但这样做会有所帮助。
require(plyr)
diffWeek <- function (df) {
abs(df$date[1] - df$date[2])}
eightWeeks <- 7*8 # 56 days
aux.data <- ddply(data, "lastName", function (df) diffWeek(df) > eightWeeks)
data[data$lastName %in% aux.data[aux.data[,2]==T,1],] # this willreturn the data.frame.
Note that my answer doesn't generalize well. If I have more time I'll try to generalize it. But it should work for now.
请注意,我的答案并不能很好地概括。如果我有更多的时间,我会尝试概括它。但它现在应该有效。