计算R中的平均值,标准差和替换值

时间:2021-12-07 22:55:38

I have the above data frame, Date&Time with corresponding signal value.

我有上面的数据框,日期和时间与相应的信号值。

  1. I need to replace all the positive value with 0
  2. 我需要将所有正值替换为0

  3. Once replaced, for every 60 seconds, I need to calculate mean and Std dev and replace the value with mean which deviating a lot.
  4. 一旦被替换,每60秒,我需要计算平均值和标准偏差,并将值替换为偏差很大的平均值。

For example, for the first 60 seconds, if the value at 2017-08-23 07:49:58 is deviating more from SD, then it should be replaced by mean. That means "59" should be replaced by mean

例如,对于前60秒,如果2017-08-23 07:49:58的值偏离SD更多,则应将其替换为均值。这意味着“59”应该用平均值代替

     date-time             RSSI
    2017-08-23 07:49:38    -68
    2017-08-23 07:49:48    -69
    2017-08-23 07:49:58    -59
    2017-08-23 07:50:08    -65
    2017-08-23 07:50:18     127
    2017-08-23 07:50:28    -74
    2017-08-23 07:50:38     127
    2017-08-23 07:50:48    -74
    2017-08-23 07:50:58     127
    2017-08-23 07:51:08    -74
    2017-08-23 07:51:18    -65
    2017-08-23 07:51:28     127
    2017-08-23 07:51:38    -59
    2017-08-23 07:51:48    -62
    2017-08-23 07:51:58    -57

Expected output:

Output 1:

  date-time              RSSI
  2017-08-23 07:49:38   -68
  2017-08-23 07:49:48   -69
  2017-08-23 07:49:58   -59
  2017-08-23 07:50:08   -65
  2017-08-23 07:50:18    0

Output 2:

  date-time              RSSI
  2017-08-23 07:49:38   -68
  2017-08-23 07:49:48   -69
  2017-08-23 07:49:58   **-62**
  2017-08-23 07:50:08   -65
  2017-08-23 07:50:18   **-62**

Here -62 is mean and its replaced

这里-62是卑鄙的并取而代之

1 个解决方案

#1


2  

Don't use for loops in R. Try and use vectored solutions and if you need performance usually the package data.table is what you want.

不要在R中使用for循环。尝试并使用向量解决方案,如果您需要性能,通常包data.table就是您想要的。

library(data.table)
dt = data.table("date-time"=c(as.POSIXct(c("2017-08-23 07:49:38", "2017-08-23 07:49:48", "2017-08-23 07:49:58", "2017-08-23 07:50:08", "2017-08-23 07:50:18", "2017-08-23 07:50:28" ))), RSSI=c(-68, -69, -59, -65, 127, -74))

dt[RSSI > 0 , RSSI:=NA] #replacing positive ones with NA
print(dt)
dt[ , minute:=floor(as.numeric(`date-time`)/60)] # calculate for each time in which minute it belongs
# calculate mean and standard deviation per group
dt[ , c("mean", "stdev") := list(mean(RSSI, na.rm=TRUE), sd(RSSI, na.rm=TRUE)), by = minute] #ignoring the NA outliers
dt[ abs(RSSI - mean) > stdev  | is.na(RSSI), RSSI:=round(mean)] #round should return an integer
print(dt)

The solution you want should look similar to this. Reading a csv with data.table works best with the function fread.

您想要的解决方案应该与此类似。使用data.table读取csv最适合函数fread。

#1


2  

Don't use for loops in R. Try and use vectored solutions and if you need performance usually the package data.table is what you want.

不要在R中使用for循环。尝试并使用向量解决方案,如果您需要性能,通常包data.table就是您想要的。

library(data.table)
dt = data.table("date-time"=c(as.POSIXct(c("2017-08-23 07:49:38", "2017-08-23 07:49:48", "2017-08-23 07:49:58", "2017-08-23 07:50:08", "2017-08-23 07:50:18", "2017-08-23 07:50:28" ))), RSSI=c(-68, -69, -59, -65, 127, -74))

dt[RSSI > 0 , RSSI:=NA] #replacing positive ones with NA
print(dt)
dt[ , minute:=floor(as.numeric(`date-time`)/60)] # calculate for each time in which minute it belongs
# calculate mean and standard deviation per group
dt[ , c("mean", "stdev") := list(mean(RSSI, na.rm=TRUE), sd(RSSI, na.rm=TRUE)), by = minute] #ignoring the NA outliers
dt[ abs(RSSI - mean) > stdev  | is.na(RSSI), RSSI:=round(mean)] #round should return an integer
print(dt)

The solution you want should look similar to this. Reading a csv with data.table works best with the function fread.

您想要的解决方案应该与此类似。使用data.table读取csv最适合函数fread。