有映射表的R过滤表可以减少NA的值

时间:2021-10-12 18:33:33

I would like to filter a table bsp1

我想过滤一个bsp1表

g <- factor(c("Company 1", "Company 2", "Company 3", "Company 4", "Company 5"))
w <- factor(c("a", "b", "c", "a", "c"))
x <- c(28, 18, 25, NA, 21)
y <- c(80, NA, 74, 101, NA)
z <- c(170, 174, 183, NA, 185)
bsp1 <- data.frame(g, w, x, y, z)
colnames(bsp1) <- c("Company", "Sector", "Item 1", "Item 2", "Item 3")
rm(w, x, y, z)
bsp1

#    Company Sector Item 1 Item 2 Item 3
# 1 Company 1      a     28     80    170
# 2 Company 2      b     18     NA    174
# 3 Company 3      c     25     74    183
# 4 Company 4      a     NA    101     NA
# 5 Company 5      c     21     NA    185

based on a mapping table bsp2

基于映射表bsp2

sector <- factor(c("a", "b", "c"))
a <- c(1, 1, 1)
b <- c(NA, 1, NA)
c <- c(NA, NA, 1)
bsp2 <- data.frame(sector, a, b, c)
colnames(bsp2)  <- c("Sector", "Item 1", "Item 2", "Item 3")
bsp2

#   Sector Item 1 Item 2 Item 3
# 1      a      1     NA     NA
# 2      b      1      1     NA
# 3      c      1     NA      1

the filter rule should be: For each sector in bsp2 for which the item is not NA, the rows in bsp1 from the respective sector with NA in the item should be removed.

过滤规则应该是:对于bsp2中不是NA的每个扇区,应该删除bsp1中包含NA的扇区中的行。

The desired outcome looks therefore like bsp3

因此,期望的结果看起来像bsp3

bsp3 <- bsp1[c(1, 3, 5),]
bsp3

#    Company Sector Item 1 Item 2 Item 3
# 1 Company 1      a     28     80    170
# 3 Company 3      c     25     74    183
# 5 Company 5      c     21     NA    185

Company 2 was dropped, as item 2 is required for b. Line 5 was not dropped as item 2 is not required for sector 3.

公司2被删除,因为b需要项目2。第5行没有被删除,因为第3扇区不需要项目2。

I thought about an apply function for all sectors over bsp1 with a vector of the relevant columns derived with following function for the vector.

我想到了一个适用于bsp1上的所有扇区的函数,它有一个相关列的向量,这个向量有如下的函数。

a <- !is.na(bsp2[1,])
a <- which(a==c("TRUE"))

Unfortunately in the filter test of one vector I have a problem that b is interpreted as a Matrix.

不幸的是,在一个向量的过滤测试中,我有一个问题b被解释为一个矩阵。

b <- is.na(bsp1[, a]) 
c <- which(b==c("TRUE"))
reduced2016b <- data2013[-c,]

Even if I set up the vector manually, the filter result is not how I expect it. Would be great if anyone has an idea to solve this problem. Also partial steps would help me already.

即使我手动设置了向量,过滤结果也不是我期望的那样。如果有人有办法解决这个问题,那就太好了。同样,部分步骤也会帮助我。

Thank you in advance!

提前谢谢你!

2 个解决方案

#1


0  

a very similar answer, to above, was just finishing it when it was being posted.

一个和上面非常相似的答案,是在发布的时候刚刚完成的。

you may want to define your data frames like so

您可能希望像这样定义数据帧

    bsp1 <- data.frame(Company=factor(c("Company 1", "Company 2", "Company  3", "Company 4", "Company 5")),
                       Sector=factor(c("a", "b", "c", "a", "c")),
                       Item1=c(28, 18, 25, NA, 21), 
                       Item2=c(80, NA, 74, 101, NA),
                       Item3= c(170, 174, 183, NA, 185))

and

    bsp2 <- data.frame(Sector=factor(c("a", "b", "c")), 
               Item1=c(1, 1, 1), 
               Item2=c(NA, 1, NA), 
               Item3=c(NA, NA, 1))

then melt and merge bsp1 and bsp2

然后熔化并合并bsp1和bsp2

    bsp1m <- melt(bsp1,id.vars=c("Company","Sector"))

    bsp2m <- melt(bsp2,id.vars="Sector", value.name = "flag") 

    bsp3m <- merge(bsp1m,bsp2m,by=c("Sector","variable"))

and finally subset bsp1 based on for where you have an NA in bsp1m and a 1 in bsp2m

最后是bsp1的子集bsp2中有一个NA和一个1

    bsp3 <- bsp1[!bsp1$Company %in% bsp3m$Company[is.na(bsp3m$value) & !is.na(bsp3m$flag)],]

#2


2  

One way is to melt both data frames, merge by Sector and variable, and find which companies have NA for value.x and not NA for value.y, i.e.

一种方法是融化这两种数据框架,按部门和变量合并,找出哪些公司的价值是天生的。x而不是NA的值。y,即。

library(reshape2)
new_df <- merge(melt(bsp1), melt(bsp2), by = c('Sector', 'variable'))
ind <- as.character(new_df$Company[is.na(new_df$value.x) & !is.na(new_df$value.y)])
bsp1[!bsp1$Company %in% ind,]

#    Company Sector Item 1 Item 2 Item 3
#1 Company 1      a     28     80    170
#3 Company 3      c     25     74    183
#5 Company 5      c     21     NA    185

#1


0  

a very similar answer, to above, was just finishing it when it was being posted.

一个和上面非常相似的答案,是在发布的时候刚刚完成的。

you may want to define your data frames like so

您可能希望像这样定义数据帧

    bsp1 <- data.frame(Company=factor(c("Company 1", "Company 2", "Company  3", "Company 4", "Company 5")),
                       Sector=factor(c("a", "b", "c", "a", "c")),
                       Item1=c(28, 18, 25, NA, 21), 
                       Item2=c(80, NA, 74, 101, NA),
                       Item3= c(170, 174, 183, NA, 185))

and

    bsp2 <- data.frame(Sector=factor(c("a", "b", "c")), 
               Item1=c(1, 1, 1), 
               Item2=c(NA, 1, NA), 
               Item3=c(NA, NA, 1))

then melt and merge bsp1 and bsp2

然后熔化并合并bsp1和bsp2

    bsp1m <- melt(bsp1,id.vars=c("Company","Sector"))

    bsp2m <- melt(bsp2,id.vars="Sector", value.name = "flag") 

    bsp3m <- merge(bsp1m,bsp2m,by=c("Sector","variable"))

and finally subset bsp1 based on for where you have an NA in bsp1m and a 1 in bsp2m

最后是bsp1的子集bsp2中有一个NA和一个1

    bsp3 <- bsp1[!bsp1$Company %in% bsp3m$Company[is.na(bsp3m$value) & !is.na(bsp3m$flag)],]

#2


2  

One way is to melt both data frames, merge by Sector and variable, and find which companies have NA for value.x and not NA for value.y, i.e.

一种方法是融化这两种数据框架,按部门和变量合并,找出哪些公司的价值是天生的。x而不是NA的值。y,即。

library(reshape2)
new_df <- merge(melt(bsp1), melt(bsp2), by = c('Sector', 'variable'))
ind <- as.character(new_df$Company[is.na(new_df$value.x) & !is.na(new_df$value.y)])
bsp1[!bsp1$Company %in% ind,]

#    Company Sector Item 1 Item 2 Item 3
#1 Company 1      a     28     80    170
#3 Company 3      c     25     74    183
#5 Company 5      c     21     NA    185