R:用最后一个非空单元格的值填充空单元格

时间:2021-11-04 23:40:13

In Excel, it is easy to grab a cell within a column and drag the cursor downward to replace many cells below so that each cell becomes the same value as the original.

在Excel中,很容易抓取列中的单元格并向下拖动光标以替换下面的多个单元格,以便每个单元格变为与原始单元格相同的值。

This function can be performed in R using a for loop. I spent some time trying to figure it out today, and thought I'd share for the benefit of the next person in my shoes:

可以使用for循环在R中执行此功能。我今天花了一些时间试图解决这个问题,并且认为我会为了下一个人的利益而分享:

for (row in 2:length(data$column)){ # 2 so you don't affect column names
    if(data$column[row] == "") {    # if its empty...
        data$column[row] = data$column[row-1] # ...replace with previous row's value
    }
}

This worked for me, although it took a long time (5-10 mins) to run with a huge data file. Perhaps there is a more efficient way of achieving this function, and I encourage anyone to say how that could be done.

虽然花了很长时间(5-10分钟)来运行庞大的数据文件,但这对我有用。也许有一种更有效的方法来实现这一功能,我鼓励任何人说如何做到这一点。

Thanks and good luck.

谢谢,祝你好运。

2 个解决方案

#1


7  

df <- data.frame(a = c(1:5, "", 3, "", "", "", 4), stringsAsFactors = FALSE)

> df
   a
1  1
2  2
3  3
4  4
5  5
6   
7  3
8   
9   
10  
11 4

while(length(ind <- which(df$a == "")) > 0){
  df$a[ind] <- df$a[ind -1]
}

> df
   a
1  1
2  2
3  3
4  4
5  5
6  5
7  3
8  3
9  3
10 3
11 4

EDIT: added time profile

编辑:添加时间档案

set.seed(1)
N = 1e6
df <- data.frame(a = sample(c("",1,2),size=N,replace=TRUE),
                 stringsAsFactors = FALSE)

if(df$a[1] == "") {df$a[1] <- NA}

system.time(
  while(length(ind <- which(df$a == "")) > 0){
    df$a[ind] <- df$a[ind - 1]
  }, gcFirst = TRUE)

user  system elapsed 
0.89    0.00    0.88 

#2


2  

Here fast solution using na.locf from the zoo package applied within data.table. I created a new column y in the result to better visualize the effect of replacing missing values( easy to repalce x column here). Since na.locf replaced missing values , an extra step was needed to replace all zero length values by NA. The solution is very fast and takes less than half second in my machine for 1e6 rows.

这里使用在data.table中应用的zoo包中的na.locf快速解决方案。我在结果中创建了一个新的列y,以更好地可视化替换缺失值的效果(这里很容易重新生成x列)。由于na.locf替换了缺失值,因此需要额外的步骤来用NA替换所有零长度值。解决方案速度非常快,在我的机器中只需不到半秒钟即可完成1e6行。

library(data.table)
library(zoo)
N=1e6  ##  number of rows 
DT <- data.table(x=sample(c("",1,2),size=N,replace=TRUE))
system.time(DT[!nzchar(x),x:=NA][,y:=na.locf(x)])
## user  system elapsed 
## 0.59    0.30    1.78 
# x y
# 1:  2 2
# 2: NA 2
# 3: NA 2
# 4:  1 1
# 5:  1 1
# ---     
#   999996:  1 1
# 999997:  2 2
# 999998:  2 2
# 999999: NA 2
# 1000000: NA 2

#1


7  

df <- data.frame(a = c(1:5, "", 3, "", "", "", 4), stringsAsFactors = FALSE)

> df
   a
1  1
2  2
3  3
4  4
5  5
6   
7  3
8   
9   
10  
11 4

while(length(ind <- which(df$a == "")) > 0){
  df$a[ind] <- df$a[ind -1]
}

> df
   a
1  1
2  2
3  3
4  4
5  5
6  5
7  3
8  3
9  3
10 3
11 4

EDIT: added time profile

编辑:添加时间档案

set.seed(1)
N = 1e6
df <- data.frame(a = sample(c("",1,2),size=N,replace=TRUE),
                 stringsAsFactors = FALSE)

if(df$a[1] == "") {df$a[1] <- NA}

system.time(
  while(length(ind <- which(df$a == "")) > 0){
    df$a[ind] <- df$a[ind - 1]
  }, gcFirst = TRUE)

user  system elapsed 
0.89    0.00    0.88 

#2


2  

Here fast solution using na.locf from the zoo package applied within data.table. I created a new column y in the result to better visualize the effect of replacing missing values( easy to repalce x column here). Since na.locf replaced missing values , an extra step was needed to replace all zero length values by NA. The solution is very fast and takes less than half second in my machine for 1e6 rows.

这里使用在data.table中应用的zoo包中的na.locf快速解决方案。我在结果中创建了一个新的列y,以更好地可视化替换缺失值的效果(这里很容易重新生成x列)。由于na.locf替换了缺失值,因此需要额外的步骤来用NA替换所有零长度值。解决方案速度非常快,在我的机器中只需不到半秒钟即可完成1e6行。

library(data.table)
library(zoo)
N=1e6  ##  number of rows 
DT <- data.table(x=sample(c("",1,2),size=N,replace=TRUE))
system.time(DT[!nzchar(x),x:=NA][,y:=na.locf(x)])
## user  system elapsed 
## 0.59    0.30    1.78 
# x y
# 1:  2 2
# 2: NA 2
# 3: NA 2
# 4:  1 1
# 5:  1 1
# ---     
#   999996:  1 1
# 999997:  2 2
# 999998:  2 2
# 999999: NA 2
# 1000000: NA 2