将列中NAs的最后一个NA替换为最后一个有效值

Here is a sample data frame:

下面是一个示例数据框架:

> df = data.frame(rep(seq(0, 120, length.out=6), times = 2), c(sample(1:50, 4), 
+ NA, NA, NA, sample(1:50, 5)))
> colnames(df) = c("Time", "Pat1")
> df
     Time Pat1
1     0   33
2    24   48
3    48    7
4    72    8
5    96   NA
6   120   NA
7     0   NA
8    24    1
9    48    6
10   72   28
11   96   31
12  120   32

NAs which have to be replaced are identified by which and logical operators:

需要替换的NAs由其和逻辑运算符识别:

x = which(is.na(df$Pat1) & df$Time == 0)

I know the locf() command, but it's replacing all NAs. How can I replace only the NAs at position x in a multi-column df?

我知道locf()命令，但是它替换了所有的NAs。如何在多列df中仅替换位于x位置的NAs ?

EDIT: Here is a link to my original dataset: link

编辑:这是我原始数据集的链接:链接

And thats how far I get:

这就是我所得到的:

require(reshape2)
require(zoo)

pad.88 <- read.csv2("pad_88.csv")
colnames(pad.88) = c("Time", "Increment", "Side", 4:length(pad.88)-3)
attach(pad.88)

x = which(Time == 240 & Increment != 5)

pad.88 = pad.88[c(1:x[1], x[1]:x[2], x[2]:x[3], x[3]:x[4], x[4]:x[5], x[5]:x[6],x[6]:x[7], x[7]:x[8], x[8]:nrow(pad.88)),] 

y = which(duplicated(pad.88))
pad.88$Time[y] = 0

pad.88$Increment[y] = Increment[x] + 1

z = which(is.na(pad.88[4:ncol(pad.88)] & pad.88$Time == 0), arr.ind=T)
a = na.locf(pad.88[4:ncol(pad.88)])

My next step is something like pat.cols[z] = a[z], which doesn't work.

我的下一步就像帕特。cols[z] = a[z]，不起作用。

That's how the result should look like:

结果应该是这样的:

Time Increment Side      1       2       3       4       5    ...

150     4       0   27,478  24,076  27,862  20,001  25,261
165     4       0   27,053  24,838  27,231  20,001  NA
180     4       0   27,599  24,166  27,862  20,687  NA
195     4       0   27,114  23,403  27,862  20,001  NA
210     4       0   26,993  24,076  27,189  19,716  NA
225     4       0   26,629  24,21   26,221  19,887  NA
240     4       0   26,811  26,228  26,431  20,001  NA
  0     5       1   26,811  26,228  26,431  20,001  25,261
 15     5       1   ....

The last valid value in col 5 is 25,261. This value replaces the NA at Time 0/Col 5.

col 5中的最后一个有效值是25261。这个值在0/Col 5时替换NA。

1 个解决方案

#1

You can change it so that x records all the NA values and use the first and last from that to identify the locations you want.

您可以更改它，以便x记录所有的NA值，并使用第一个和最后一个来标识您想要的位置。

df
   Time Pat1
1     0   36
2    24   13
3    48   32
4    72   38
5    96   NA
6   120   NA
7     0   NA
8    24    5
9    48   10
10   72    7
11   96   25
12  120   28

x <- which(is.na(df$Pat1))
df[rev(x)[1],"Pat1"] <- df[x[1]-1,"Pat1"]
df
   Time Pat1
1     0   36
2    24   13
3    48   32
4    72   38
5    96   NA
6   120   NA
7     0   38
8    24    5
9    48   10
10   72    7
11   96   25
12  120   28

For the multi-column example use the same idea in a sapply call:

对于多列示例，在sapply调用中使用相同的思想:

cbind(df[1],sapply(df[-1],function(x) {y<-which(is.na(x));x[rev(y)[1]]<-x[y[1]-1];x}))
   Time Pat1 Pat2
1     0   41   42
2    24    8   30
3    48    3   41
4    72   14   NA
5    96   NA   NA
6   120   NA   NA
7     0   14   41
8    24    5   37
9    48   29   48
10   72   31   11
11   96   50   43
12  120   46   21

#1