基于两个数据框创建Dataframe

时间:2022-12-30 22:54:35

I would like to create a dataframe based on 2 existing dataframes that extracts values based on a dummy (which is common to both of them), in particular if the dummy is 1, then extract from dataframe a, if the dummy is 0, then extract from dataframe b). This is what they look like:

我想创建一个基于2个现有数据帧的数据帧,这些数据帧基于虚拟数据提取值(这两个数据库共有),特别是如果虚拟数据为1,则从数据帧a中提取,如果虚拟数据为0,则从数据帧中提取b)。这是他们的样子:

a:

var1   var2    var3   dummy
ax1     ay1    az1    1
ax2     ay2    az1    0
ax3     ay3    az1    1

b:

var1   var2    var3   dummy
bx1     by1    bz1    1
bx2     by2    bz1    0
bx3     by3    bz1    1

My goal is to obtain a new dataframe that extracts based on the dummy like this:

我的目标是获得一个新的数据帧,根据这个虚拟提取:

c:

var1   var2    var3   dummy
ax1     ay1    az1    1
bx2     by2    bz1    0
ax3     ay3    az1    1

I am working on a cumbersome loop right now, but I am wondering if there is a simple way within the apply family?

我现在正在进行一个繁琐的循环,但我想知道申请系列中是否有一个简单的方法?

2 个解决方案

#1


2  

Hm, I would just use a simple rbind with conditionals:

嗯,我只想使用带条件的简单rbind:

new_df <- rbind(a[a$dummy == 1,], b[b$dummy == 0,])

That should output:

这应该输出:

var1   var2    var3   dummy
ax1     ay1    az1    1
bx2     by2    bz1    0
ax3     ay3    az1    1

As a side note, you very very rarely have to use loops in R. Odds are, if you are using a loop there is a better, more R way to do things.

作为旁注,你很少需要在R中使用循环。如果你使用循环,那么有更好的,更R的方法来做事。

#2


2  

Try this subsetting strategy.

尝试这种子集策略。

sa <- as.logical(a$dummy) # use this to subset a
sb <- as.logical(1 - a$dummy) # and this to subset b
c <- rbind(a[sa, ], b[sb, ])
# Output
#  var1 var2 var3 dummy
#1  ax1  ay1  az1     1
#3  ax3  ay3  az1     1
#2  bx2  by2  bz1     0

# Data
a <- structure(list(var1 = c("ax1", "ax2", "ax3"), var2 = c("ay1", 
"ay2", "ay3"), var3 = c("az1", "az1", "az1"), dummy = c(1, 0, 
1)), .Names = c("var1", "var2", "var3", "dummy"), class = "data.frame", row.names = c(NA, 
-3L))
b <- structure(list(var1 = c("bx1", "bx2", "bx3"), var2 = c("by1", 
"by2", "by3"), var3 = c("bz1", "bz1", "bz1"), dummy = c(1, 0, 
1)), .Names = c("var1", "var2", "var3", "dummy"), class = "data.frame", row.names = c(NA, 
-3L))

#1


2  

Hm, I would just use a simple rbind with conditionals:

嗯,我只想使用带条件的简单rbind:

new_df <- rbind(a[a$dummy == 1,], b[b$dummy == 0,])

That should output:

这应该输出:

var1   var2    var3   dummy
ax1     ay1    az1    1
bx2     by2    bz1    0
ax3     ay3    az1    1

As a side note, you very very rarely have to use loops in R. Odds are, if you are using a loop there is a better, more R way to do things.

作为旁注,你很少需要在R中使用循环。如果你使用循环,那么有更好的,更R的方法来做事。

#2


2  

Try this subsetting strategy.

尝试这种子集策略。

sa <- as.logical(a$dummy) # use this to subset a
sb <- as.logical(1 - a$dummy) # and this to subset b
c <- rbind(a[sa, ], b[sb, ])
# Output
#  var1 var2 var3 dummy
#1  ax1  ay1  az1     1
#3  ax3  ay3  az1     1
#2  bx2  by2  bz1     0

# Data
a <- structure(list(var1 = c("ax1", "ax2", "ax3"), var2 = c("ay1", 
"ay2", "ay3"), var3 = c("az1", "az1", "az1"), dummy = c(1, 0, 
1)), .Names = c("var1", "var2", "var3", "dummy"), class = "data.frame", row.names = c(NA, 
-3L))
b <- structure(list(var1 = c("bx1", "bx2", "bx3"), var2 = c("by1", 
"by2", "by3"), var3 = c("bz1", "bz1", "bz1"), dummy = c(1, 0, 
1)), .Names = c("var1", "var2", "var3", "dummy"), class = "data.frame", row.names = c(NA, 
-3L))