如何提高r中这种线性插值的性能

For a given column in a dataframe, I want to construct a new vector which for each point consists of the average of the points on either side. However for the last observation it will instead be the second to last. And for the first observation it will be second. I wrote this R code to solve the issue, however I am calling it repeatedly and it is extremely slow. Can someone give some tips on how to do it more efficiently? Thanks.

对于数据框中的给定列，我想构造一个新的向量，每个点由两侧的点的平均值组成。然而，对于最后一次观察，它将是倒数第二次。而对于第一次观察，它将是第二次。我写了这个R代码来解决这个问题，但是我反复调用它并且它非常慢。有人可以提供一些有关如何更有效地做到这一点的提示吗？谢谢。

x1 <- c(rep('a',100),rep('b',100),rep('c',100))
x2 <- rnorm(300)
x <- data.frame(x1,x2)
names(x) <- c('col1','data1')


a.linear.interpolation <- function(x) {
    require(zoo)
    require(data.table)

    a.dattab <- data.table(x)

    setkey(a.dattab,col1)

    #replace any NA values using LOCF / NOCB
    a.dattab[,data1:=na.locf(data1,na.rm=FALSE),by=list(col1)]
    a.dattab[,data1:=na.locf(data1,na.rm=FALSE,fromLast=TRUE),by=list(col1)]

    #Adding a within group sequence number and a size of group field to facilitate
    #row by row processing
    a.dattab[,grpseq:=seq_len(.N),by=list(col1)]
    a.dattab[,grpseq_max:=.N,by=list(col1)]

    #convert back to data.frame
    #data.frame seems faster than data.table for this row by row type processing
    a.df <- data.frame(a.dattab)

    new.col <- vector(length=nrow(a.df))

    for(i in seq(nrow(a.df))){
        if(a.df[i,"grpseq"]==1){
                new.col[i] <- a.df[i+1,"data1"]
            }
        else if(a.df[i,"grpseq"]==a.df[i,"grpseq_max"]){
                new.col[i] <- a.df[i-1,"data1"]
            }
        else {
                new.col[i] <- (a.df[i-1,"data1"]+a.df[i+1,"data1"])/2
            }
    }

    return(new.col)
}

1 个解决方案

#1

Apart from using rollmeans, the base R filter function can do this sort of thing as well. E.g.:

除了使用rollmeans之外，基本R过滤器功能也可以做这种事情。例如。：

linint <- function(vec) {
  c(vec[2], filter(vec, c(0.5, 0, 0.5))[-c(1, length(vec))], vec[length(vec) - 1])
}

x <- c(1,3,6,10,1)
linint(x)
#[1]  3.0  3.5  6.5  3.5 10.0

And it's pretty quick, chewing through 10M cases in less than a second:

它非常快，在不到一秒的时间内咀嚼10M案例：

x <- rnorm(1e7)
system.time(linint(x))
#user  system elapsed 
#0.57    0.18    0.75

#1