while循环使用data.table的条件

时间:2022-09-16 11:42:54

I am new to loops in R and have a relatively simple dataset to process. My sample dataset consists of timestamps - time, cell phone id: id, and cell phone battery level: level My objective is to produce an output which takes the rate of battery decline over time, taking account recharge cycles. The beginning of a cycle can be identified where the following record's level is greater than the previous. In other words, while level <= lag(level), the cycle should continue, but as soon as level > lag(level), the cycle should restart

我是R中的循环新手,并且有一个相对简单的数据集来处理。我的样本数据集包括时间戳 - 时间,手机ID:id和手机电池电量:级别我的目标是产生一个输出,该输出随着时间的推移考虑电池电量下降的速率,考虑到充电周期。可以识别循环的开始,其中后续记录的级别大于前一级别。换句话说,当水平<=滞后(水平)时,循环应该继续,但是一旦水平>滞后(水平),循环应该重新开始

> test
                   time id level
 1: 2017-12-25 14:10:03  1    81
 2: 2017-12-25 14:20:03  1    81
 3: 2017-12-25 14:30:04  1    81
 4: 2017-12-25 14:40:04  1    73
 5: 2017-12-25 14:50:04  1    70
 6: 2017-12-25 15:00:03  1    70
 7: 2017-12-25 15:10:04  1    65
 8: 2017-12-25 15:20:04  1    62
 9: 2017-12-25 15:30:04  1    61
10: 2017-12-25 15:40:04  1    60
11: 2017-12-25 15:50:03  1    60
12: 2017-12-25 16:00:04  1    60
13: 2017-12-25 16:10:04  1    95
14: 2017-12-25 16:20:03  1    95
15: 2017-12-25 16:30:04  1    95
16: 2017-12-25 16:40:04  1    94
17: 2017-12-25 16:50:04  1    92
18: 2017-12-25 17:00:03  1    90
19: 2017-12-25 17:10:04  1    81
20: 2017-12-25 17:20:03  1    79
21: 2017-12-25 17:30:03  2   100
22: 2017-12-25 17:40:03  2   100
23: 2017-12-25 17:50:03  2   100
24: 2017-12-25 18:00:03  2    90
25: 2017-12-25 18:10:03  2    85
26: 2017-12-25 18:20:03  2    75
27: 2017-12-25 18:30:04  2    65
28: 2017-12-25 18:40:03  2    54
29: 2017-12-25 18:50:03  2    32
30: 2017-12-25 19:00:03  2    11
31: 2017-12-25 19:10:04  2    92
32: 2017-12-25 19:20:04  2    92
33: 2017-12-25 19:30:03  2    91
34: 2017-12-25 19:40:04  2    90
35: 2017-12-25 19:50:04  2    90
36: 2017-12-25 20:00:03  2    81
37: 2017-12-25 20:10:03  2    79
38: 2017-12-25 20:20:04  2    99
39: 2017-12-25 20:30:04  2    96
40: 2017-12-25 20:40:03  2    96

In the sample dataset above, the intended output would look like this, where difftime = the difference in time between where the cycle started and stopped, diffcharge = the difference in battery level between where the cycle started and stopped, and rate = diffcharge/difftime

在上面的示例数据集中,预期输出将如下所示,其中difftime =循环开始和停止之间的时间差,diffcharge =循环开始和停止之间的电池电量差异,以及rate = diffcharge / difftime

> outcome
  id               start            recharge difftime diffcharge      rate
1  1 2017-12-25 14:10:03 2017-12-25 16:00:04  110.0167          21 0.1908801
2  1 2017-12-25 16:10:04 2017-12-25 17:20:03  69.98333          16 0.2286259
3  2 2017-12-25 17:30:03 2017-12-25 19:00:03        90          89 0.9888889
4  2 2017-12-25 19:10:04 2017-12-25 20:10:03  59.98333          13 0.2167269
5  2 2017-12-25 20:20:04 2017-12-25 20:40:03  19.98333           3 0.1501251

I have tried so far simply to create a while loop that concatenates the levels from each cycle, after which I can take the min, max, etc. with the following code but this does not produce the intended output.

到目前为止,我已经尝试过简单地创建一个while循环来连接每个循环的级别,之后我可以使用以下代码获取min,max等,但这不会产生预期的输出。

raw_data <- test
unique_id = unique(test$id)

for (id in unique_id)
{
  onePhone <- raw_data[ which(raw_data$id == id), ]
  onePhone <- onePhone[order(onePhone$time, decreasing = FALSE),]
  cycle <- NULL

  if(nrow(onePhone) >=2 ){
    for(i in 2:nrow(onePhone)) {
      while(onePhone[i-1,"level"] >= onePhone[i,"level"])
      { 
        i = i+1
        cycle <- c(z, onePhone[i,"level"])
      } 
      print(cycle)
    }
  }
}

Any advice on how to use data.table, dplyr, or a simple while loop would be appreciated. Here is the sample data:

有关如何使用data.table,dplyr或简单while循环的任何建议将不胜感激。以下是示例数据:

> dput(test)
structure(list(time = structure(c(1514229003.91212, 1514229603.61297, 
1514230204.14629, 1514230804.81938, 1514231404.36784, 1514232003.73393, 
1514232604.17933, 1514233204.00143, 1514233804.68755, 1514234404.15599, 
1514235003.99419, 1514235604.68204, 1514236204.18828, 1514236803.66526, 
1514237404.0434, 1514238004.40609, 1514238604.02003, 1514239203.42238, 
1514239804.19495, 1514240403.15927, 1514241003.87092, 1514241603.93167, 
1514242203.77223, 1514242803.66758, 1514243403.33705, 1514244003.25017, 
1514244604.05367, 1514245203.7921, 1514245803.2651, 1514246403.63888, 
1514247004.02684, 1514247604.04009, 1514248203.99929, 1514248804.07401, 
1514249404.11004, 1514250003.74613, 1514250603.88962, 1514251204.19115, 
1514251804.06932, 1514252403.94181), class = c("POSIXct", "POSIXt"
), tzone = "EST"), id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2), level = c(81, 81, 81, 73, 70, 70, 65, 62, 
61, 60, 60, 60, 95, 95, 95, 94, 92, 90, 81, 79, 100, 100, 100, 
90, 85, 75, 65, 54, 32, 11, 92, 92, 91, 90, 90, 81, 79, 99, 96, 
96)), .Names = c("time", "id", "level"), class = c("data.table", 
"data.frame"), row.names = c(NA, -40L), .internal.selfref = <pointer: 0x102010778>)

3 个解决方案

#1


2  

Using @Hugh approach in first step and then getting the end result:

在第一步中使用@Hugh方法,然后获得最终结果:

test[, cycle := cumsum(level > shift(level, fill = first(level))), by = "id"]
x <- test[, .(start = min(time),
              recharge = max(time),
              diffcharge = max(level) - min(level)),
          by = .(id, cycle)]
x[, difftime := as.numeric(recharge - start)]
x[, rate :=  diffcharge / difftime]
x
#    id cycle               start            recharge diffcharge  difftime      rate
# 1:  1     0 2017-12-25 14:10:03 2017-12-25 16:00:04         21 110.01283 0.1908868
# 2:  1     1 2017-12-25 16:10:04 2017-12-25 17:20:03         16  69.98285 0.2286274
# 3:  2     0 2017-12-25 17:30:03 2017-12-25 19:00:03         89  89.99613 0.9889314
# 4:  2     1 2017-12-25 19:10:04 2017-12-25 20:10:03         13  59.99771 0.2166749
# 5:  2     2 2017-12-25 20:20:04 2017-12-25 20:40:03          3  19.99584 0.1500312

#2


1  

If test is a data.table, you can use cumsum with shift. (shift is a function from data.table; it's the same as lag.)

如果test是data.table,你可以使用带有shift的cumsum。 (shift是data.table中的函数;它与lag相同。)

test[, cycle := cumsum(level > shift(level, fill = first(level))), by = "id"]

#3


1  

Assuming you read test from a csv file:

假设您从csv文件中读取测试:

test<-read.csv("test.csv",stringsAsFactors = F)
test$DateTime<-paste(test$Date,test$time,by=" ")
test$Charge<-FALSE
test$Charge[1:((nrow(test)-1))]<-diff(test$level)>0

start=test[which(test$Charge)+1,]$DateTime
start<-c(test$DateTime[1],start)
start<-dmy_hms(start)

recharge<-filter(test,Charge)$DateTime
recharge<-c(recharge,tail(test$DateTime,1))
recharge<-dmy_hms(recharge)

difftime=recharge-start

startLevel=test[which(test$Charge)+1,]$level
startLevel=c(test$level[1],startLevel)
endLevel=filter(test,Charge)$level
endLevel=c(endLevel,tail(test$level,1))

diffcharge=startLevel-endLevel

rate=diffcharge/as.numeric(difftime)

id=filter(test,Charge)$id
id=c(id,tail(test$id,1))

outcome=data.frame(id=id,start=start,recharge=recharge,difftime=difftime,diffcharge=diffcharge,rate=rate)

#1


2  

Using @Hugh approach in first step and then getting the end result:

在第一步中使用@Hugh方法,然后获得最终结果:

test[, cycle := cumsum(level > shift(level, fill = first(level))), by = "id"]
x <- test[, .(start = min(time),
              recharge = max(time),
              diffcharge = max(level) - min(level)),
          by = .(id, cycle)]
x[, difftime := as.numeric(recharge - start)]
x[, rate :=  diffcharge / difftime]
x
#    id cycle               start            recharge diffcharge  difftime      rate
# 1:  1     0 2017-12-25 14:10:03 2017-12-25 16:00:04         21 110.01283 0.1908868
# 2:  1     1 2017-12-25 16:10:04 2017-12-25 17:20:03         16  69.98285 0.2286274
# 3:  2     0 2017-12-25 17:30:03 2017-12-25 19:00:03         89  89.99613 0.9889314
# 4:  2     1 2017-12-25 19:10:04 2017-12-25 20:10:03         13  59.99771 0.2166749
# 5:  2     2 2017-12-25 20:20:04 2017-12-25 20:40:03          3  19.99584 0.1500312

#2


1  

If test is a data.table, you can use cumsum with shift. (shift is a function from data.table; it's the same as lag.)

如果test是data.table,你可以使用带有shift的cumsum。 (shift是data.table中的函数;它与lag相同。)

test[, cycle := cumsum(level > shift(level, fill = first(level))), by = "id"]

#3


1  

Assuming you read test from a csv file:

假设您从csv文件中读取测试:

test<-read.csv("test.csv",stringsAsFactors = F)
test$DateTime<-paste(test$Date,test$time,by=" ")
test$Charge<-FALSE
test$Charge[1:((nrow(test)-1))]<-diff(test$level)>0

start=test[which(test$Charge)+1,]$DateTime
start<-c(test$DateTime[1],start)
start<-dmy_hms(start)

recharge<-filter(test,Charge)$DateTime
recharge<-c(recharge,tail(test$DateTime,1))
recharge<-dmy_hms(recharge)

difftime=recharge-start

startLevel=test[which(test$Charge)+1,]$level
startLevel=c(test$level[1],startLevel)
endLevel=filter(test,Charge)$level
endLevel=c(endLevel,tail(test$level,1))

diffcharge=startLevel-endLevel

rate=diffcharge/as.numeric(difftime)

id=filter(test,Charge)$id
id=c(id,tail(test$id,1))

outcome=data.frame(id=id,start=start,recharge=recharge,difftime=difftime,diffcharge=diffcharge,rate=rate)