如何根据设置的时间间隔自动分离时间序列数据并汇总每个子集？

I am looking for a way to take a raw two-minute interval data file of time vs. rainfall depth, many of which are "zero" values:

我正在寻找一种方法来获取时间与降雨深度的原始两分钟间隔数据文件，其中许多是“零”值：

 Date.time <- c("9/26/2014 15:15", 
"9/26/2014 15:12", 
"9/26/2014 15:14", 
"9/26/2014 15:16", 
"9/26/2014 15:18",
"9/26/2014 15:20",
"9/26/2014 15:22",
"9/26/2014 15:24",
"9/26/2014 15:26",
"9/26/2014 15:34",
"9/26/2014 15:36",
"9/26/2014 15:38",
"9/26/2014 15:40",
"9/26/2014 15:42",
"9/26/2014 15:44",
"9/26/2014 15:46")

Rain <- c(0,.05,.1,.03,0,0,.2,0,0,0,0,0,.04,.1,.15,.22)
my.df <- data.frame(Date.time, Rain)

Does anyone know how I could lump the "Rain" column into separate categories based on how many zero values preceded it? I would like to label or extract each set of data that are separated by a pre-defined number of minutes or zero values without any data. For instance, if I said that non-zero values separated by 10 minutes or more of time would be considered two separate subsets, then I would like to label them as such or extract summary data of each subset based on that criterion. In that case, the 0.05, .1, .03, and .2 values would be lumped together, because the 0.03 and 0.2 values are separated by only 4 minutes, not ten, and thus are not discretely separate by my arbitrary classification. The next set of non-zeros (0.04, 0.1, 0.15, and 0.22) are separated by ten minutes from the 0.2 value, thus meeting the arbitrary classification threshold. Does anybody have any ideas where I could go from here?

有谁知道我怎么能把“Rain”列分成几个零的类别，根据它之前有多少零值？我想标记或提取由预定义的分钟数或零值分隔的每组数据，而不包含任何数据。例如，如果我说分隔10分钟或更长时间的非零值将被视为两个单独的子集，那么我想将它们标记为这样或者基于该标准提取每个子集的摘要数据。在这种情况下，0.05，.1，.03和.2值将被集中在一起，因为0.03和0.2值仅相隔4分钟而不是10分钟，因此不会因我的任意分类而离散地分开。下一组非零（0.04,0.1,0.15和0.22）与0.2值分开10分钟，从而满足任意分类阈值。我有什么想法可以从这里出发吗？

EDIT: I would preferably like to remove the zero values, which is not too difficult:

编辑：我最好删除零值，这不是太难：

 my.df2 <- subset(my.df, Rain>0)

Then, with my.df2, I would like to find when data is within 10 minutes of each other, and cumulatively classify the Rain data as one "event". A conceptual output might look like this:

然后，使用my.df2，我想找到数据彼此在10分钟之内，并将Rain数据累积分类为一个“事件”。概念输出可能如下所示：

     Date.time     Rain     Event
9/26/2014 15:12     0.05     A
9/26/2014 15:14     0.10     A
9/26/2014 15:16     0.03     A
9/26/2014 15:22     0.20     A
9/26/2014 15:40     0.04     B
9/26/2014 15:42     0.10     B
9/26/2014 15:44     0.15     B
9/26/2014 15:46     0.22     B

Thank you so much.

非常感谢。

1 个解决方案

#1

This forms a new group if at least k zeros separate it from the prior group. We define an na.locf wrapper with the desired default arguments and then use it in the next line to compute grp. grp is 1 for the first group, 2 for the next group and so on. It is NA for positions not belonging to a group. The key to all this is the maxgap argument in na.locf:

如果至少k个零与前一组分开，则形成一个新组。我们使用所需的默认参数定义一个na.locf包装器，然后在下一行中使用它来计算grp。 grp对于第一组是1，对于下一组是2，依此类推。对于不属于某个组的职位，这是NA。所有这一切的关键是na.locf中的maxgap参数：

library(zoo)

k <- 3

Rain <- my.df$Rain
Rain[Rain == 0] <- NA
nalocf <- function(x) na.locf(x, maxgap = k, na.rm = FALSE)

grp <- cumsum(diff(!is.na(c(NA, nalocf(Rain)))) > 0) + nalocf(0 * Rain)
na.omit(cbind(my.df, grp))

The result in this case is:

这种情况的结果是：

         Date.time Rain grp
2  9/26/2014 15:12 0.05   1
3  9/26/2014 15:14 0.10   1
4  9/26/2014 15:16 0.03   1
7  9/26/2014 15:22 0.20   1
13 9/26/2014 15:40 0.04   2
14 9/26/2014 15:42 0.10   2
15 9/26/2014 15:44 0.15   2
16 9/26/2014 15:46 0.22   2

Update Correction.

更新更正。

#1

library(zoo)

k <- 3

Rain <- my.df$Rain
Rain[Rain == 0] <- NA
nalocf <- function(x) na.locf(x, maxgap = k, na.rm = FALSE)

grp <- cumsum(diff(!is.na(c(NA, nalocf(Rain)))) > 0) + nalocf(0 * Rain)
na.omit(cbind(my.df, grp))