根据前一行中的值填充缺失的列值

时间:2022-05-03 23:42:03

I have a huge table which basically looks like this:

我有一个很大的桌子基本上是这样的

A  B  C  D  E  F
A  B        &
A  B  C  D     $
A  B  C  @

The processed version should look like this:

处理后的版本应该如下:

A  B  C  D  E  F
A  B  B& B& B& B&
A  B  C  D  D$ D$
A  B  C  C@ C@ C@

The task is to concatenate value from the last non empty cell with the value from previous non empty cell (in same row) and use the new value to fill empty cells in that same row.

任务是将最后一个非空单元格的值与以前的非空单元格(在同一行)的值连接起来,并使用新值填充同一行的空单元格。

Any suggestions how to do this in R?

有什么建议吗?

3 个解决方案

#1


3  

Here is one option that loops through the rows of the dataset. We subset the elements of each row by selecting the elements that are not blank ('x1'), paste the last two non-blank elements in 'x1' together ('x2'), and then concatenate all the values except the last one (head(x1,-1)) with the 'x2' values that are replicated based on the number of columns of 'df1' and the length of 'x1'. The result can be transposed (t) and converted to data.frame

这里有一个选项循环遍历数据集的行。我们通过选择每一行的元素子集的元素不是空白(x1)、粘贴在x1的最后两个非空元素(“x2”),然后将所有的价值,除了最后一个(头(x1,1))“x2”值复制列数的基础上“df1”和“x1”的长度。结果可以转置(t)并转换为data.frame

 m1 <- t(apply(df1, 1, function(x) {
          x1 <- x[x!=''] #elements that are not-blank
          x2 <- paste(tail(x1,2), collapse='') #paste  the last two non-blank
          if(any(x=='')) #if there is any blank value
          c(head(x1,-1), rep(x2, ncol(df1)-length(x1)+1)) #concatenate
          else x #else return the row
           }))

 as.data.frame(m1, stringsAsFactors=FALSE)
 #  V1 V2 V3 V4 V5 V6
 #1  A  B  C  D  E F
 #2  A  B B& B& B& B&
 #3  A  B  C  D D$ D$
 #4  A  B  C C@ C@ C@

data

 df1 <- structure(list(v1 = c("A", "A", "A", "A"), v2 = c("B", "B", "B", 
 "B"), v3 = c("C", "", "C", "C"), v4 = c("D", "", "D", "@"), v5 = c("E", 
 "&", "", ""), v6 = c("F", "", "$", "")), .Names = c("v1", "v2", 
 "v3", "v4", "v5", "v6"), class = "data.frame", row.names = c(NA, -4L))

#2


1  

This problem screamed na.locf from zoo:

这个问题尖叫na。从动物园locf:

First, replace "" to NA: x[sapply(x,function(y)y=="X")]<-NA

首先,将“”替换为NA: x[sapply(x,function(y)y= " x ")]<-NA

Strip symbols:

带符号:

x.no.sym<-x
x.no.sym[sapply(x.no.sym,function(y)!y%in%LETTERS)]<-NA

Fill out the letters:

填写字母:

x.no.sym.fill<-t(apply(x.no.sym,1,na.locf))
     V1  V2  V3  V4  V5  V6 
[1,] "A" "B" "C" "D" "E" "F"
[2,] "A" "B" "B" "B" "B" "B"
[3,] "A" "B" "C" "D" "D" "D"
[4,] "A" "B" "C" "C" "C" "C"

Now fill out the symbols and delete the letters:

现在填写这些符号并删除这些字母:

x.sym.fill<-x.sym.fill<-t(apply(x,1,function(y)na.locf(na.locf(y,fromLast=T,na.rm=F),na.rm=F)))
x.sym.fill[sapply(x.sym.fill,function(y)y%in%LETTERS)]<-""
     V1 V2 V3  V4  V5  V6 
[1,] "" "" ""  ""  ""  "" 
[2,] "" "" "&" "&" "&" "&"
[3,] "" "" ""  ""  "$" "$"
[4,] "" "" ""  "@" "@" "@"

Now concatenate:

现在连接:

> matrix(paste0(x.no.sym.fill,x.sym.fill),ncol=ncol(x))

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A"  "B"  "C"  "D"  "E"  "F" 
[2,] "A"  "B"  "B&" "B&" "B&" "B&"
[3,] "A"  "B"  "C"  "D"  "D$" "D$"
[4,] "A"  "B"  "C"  "C@" "C@" "C@"

#3


0  

This one seems fun. I took blank spaces in the data frame to be "" and called the data frame df.

这一个看起来有趣。我将数据框中的空格设为“”,并将其称为数据框df。

fill = apply(df, 1, function(x) { 
  x = x[x != ""]
  paste(tail(x, 2), collapse = "")
})

df[df == ""] = matrix(fill, ncol = ncol(df), nrow = nrow(df))[df == ""]

Find for each row the unique filler value, make a matrix of the same structure as your original one of the fill values, then cherry pick the values you need to replace.

为每一行找到唯一的填充值,创建一个与原始填充值相同结构的矩阵,然后选择需要替换的值。

df = structure(list(A = c("A", "A", "A"), B = c("B", "B", "B"), C = c("", 
"C", "C"), D = c("", "D", "@"), E = c("&", "", ""), F = c("", 
"$", "")), .Names = c("A", "B", "C", "D", "E", "F"), row.names = c(NA, 
-3L), class = "data.frame")

#1


3  

Here is one option that loops through the rows of the dataset. We subset the elements of each row by selecting the elements that are not blank ('x1'), paste the last two non-blank elements in 'x1' together ('x2'), and then concatenate all the values except the last one (head(x1,-1)) with the 'x2' values that are replicated based on the number of columns of 'df1' and the length of 'x1'. The result can be transposed (t) and converted to data.frame

这里有一个选项循环遍历数据集的行。我们通过选择每一行的元素子集的元素不是空白(x1)、粘贴在x1的最后两个非空元素(“x2”),然后将所有的价值,除了最后一个(头(x1,1))“x2”值复制列数的基础上“df1”和“x1”的长度。结果可以转置(t)并转换为data.frame

 m1 <- t(apply(df1, 1, function(x) {
          x1 <- x[x!=''] #elements that are not-blank
          x2 <- paste(tail(x1,2), collapse='') #paste  the last two non-blank
          if(any(x=='')) #if there is any blank value
          c(head(x1,-1), rep(x2, ncol(df1)-length(x1)+1)) #concatenate
          else x #else return the row
           }))

 as.data.frame(m1, stringsAsFactors=FALSE)
 #  V1 V2 V3 V4 V5 V6
 #1  A  B  C  D  E F
 #2  A  B B& B& B& B&
 #3  A  B  C  D D$ D$
 #4  A  B  C C@ C@ C@

data

 df1 <- structure(list(v1 = c("A", "A", "A", "A"), v2 = c("B", "B", "B", 
 "B"), v3 = c("C", "", "C", "C"), v4 = c("D", "", "D", "@"), v5 = c("E", 
 "&", "", ""), v6 = c("F", "", "$", "")), .Names = c("v1", "v2", 
 "v3", "v4", "v5", "v6"), class = "data.frame", row.names = c(NA, -4L))

#2


1  

This problem screamed na.locf from zoo:

这个问题尖叫na。从动物园locf:

First, replace "" to NA: x[sapply(x,function(y)y=="X")]<-NA

首先,将“”替换为NA: x[sapply(x,function(y)y= " x ")]<-NA

Strip symbols:

带符号:

x.no.sym<-x
x.no.sym[sapply(x.no.sym,function(y)!y%in%LETTERS)]<-NA

Fill out the letters:

填写字母:

x.no.sym.fill<-t(apply(x.no.sym,1,na.locf))
     V1  V2  V3  V4  V5  V6 
[1,] "A" "B" "C" "D" "E" "F"
[2,] "A" "B" "B" "B" "B" "B"
[3,] "A" "B" "C" "D" "D" "D"
[4,] "A" "B" "C" "C" "C" "C"

Now fill out the symbols and delete the letters:

现在填写这些符号并删除这些字母:

x.sym.fill<-x.sym.fill<-t(apply(x,1,function(y)na.locf(na.locf(y,fromLast=T,na.rm=F),na.rm=F)))
x.sym.fill[sapply(x.sym.fill,function(y)y%in%LETTERS)]<-""
     V1 V2 V3  V4  V5  V6 
[1,] "" "" ""  ""  ""  "" 
[2,] "" "" "&" "&" "&" "&"
[3,] "" "" ""  ""  "$" "$"
[4,] "" "" ""  "@" "@" "@"

Now concatenate:

现在连接:

> matrix(paste0(x.no.sym.fill,x.sym.fill),ncol=ncol(x))

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A"  "B"  "C"  "D"  "E"  "F" 
[2,] "A"  "B"  "B&" "B&" "B&" "B&"
[3,] "A"  "B"  "C"  "D"  "D$" "D$"
[4,] "A"  "B"  "C"  "C@" "C@" "C@"

#3


0  

This one seems fun. I took blank spaces in the data frame to be "" and called the data frame df.

这一个看起来有趣。我将数据框中的空格设为“”,并将其称为数据框df。

fill = apply(df, 1, function(x) { 
  x = x[x != ""]
  paste(tail(x, 2), collapse = "")
})

df[df == ""] = matrix(fill, ncol = ncol(df), nrow = nrow(df))[df == ""]

Find for each row the unique filler value, make a matrix of the same structure as your original one of the fill values, then cherry pick the values you need to replace.

为每一行找到唯一的填充值,创建一个与原始填充值相同结构的矩阵,然后选择需要替换的值。

df = structure(list(A = c("A", "A", "A"), B = c("B", "B", "B"), C = c("", 
"C", "C"), D = c("", "D", "@"), E = c("&", "", ""), F = c("", 
"$", "")), .Names = c("A", "B", "C", "D", "E", "F"), row.names = c(NA, 
-3L), class = "data.frame")