This question already has an answer here:
这个问题在这里已有答案:
- How to implement coalesce efficiently in R 7 answers
如何在R 7答案中有效地实现合并
EDIT: this is a dupe of How to implement coalesce efficiently in R, agreed. I didn't realize how my problem was more general than my specific application, so this discussion has been great.
编辑:这是如何在R中有效实现合并的一个骗局,同意。我没有意识到我的问题比我的具体应用更普遍,所以这个讨论很棒。
Sometimes, the response variable in a randomized experiment is contained in a different column for each experimental group (Y_1 through Y_5 in the code below). It's often best to collect the response variable into a single column (Y_all). I end up doing it as in the example below. But I'm SURE there's a better way. thoughts?
有时,随机实验中的响应变量包含在每个实验组的不同列中(下面的代码中为Y_1到Y_5)。通常最好将响应变量收集到单个列(Y_all)中。我最终这样做,如下例所示。但我确定有更好的方法。想法?
set.seed(343)
N <- 1000
group <- sample(1:5, N, replace=TRUE)
Y_1 <- ifelse(group==1, rbinom(sum(group==1), 1, .5), NA)
Y_2 <- ifelse(group==2, rbinom(sum(group==2), 1, .5), NA)
Y_3 <- ifelse(group==3, rbinom(sum(group==3), 1, .5), NA)
Y_4 <- ifelse(group==4, rbinom(sum(group==4), 1, .5), NA)
Y_5 <- ifelse(group==5, rbinom(sum(group==5), 1, .5), NA)
## This is the part I want to make more efficient
Y_all <- ifelse(!is.na(Y_1), Y_1,
ifelse(!is.na(Y_2), Y_2,
ifelse(!is.na(Y_3), Y_3,
ifelse(!is.na(Y_4), Y_4,
ifelse(!is.na(Y_5), Y_5,
NA)))))
table(Y_all, Y_1, exclude = NULL)
table(Y_all, Y_2, exclude = NULL)
3 个解决方案
#1
5
I like to use a coalesce()
function for this
我喜欢使用coalesce()函数
#available from https://gist.github.com/MrFlick/10205794
coalesce<-function(...) {
x<-lapply(list(...), function(z) {if (is.factor(z)) as.character(z) else z})
m<-is.na(x[[1]])
i<-2
while(any(m) & i<=length(x)) {
if ( length(x[[i]])==length(x[[1]])) {
x[[1]][m]<-x[[i]][m]
} else if (length(x[[i]])==1) {
x[[1]][m]<-x[[i]]
} else {
stop(paste("length mismatch in argument",i," - found:", length( x[[i]] ),"expected:",length( x[[1]] ) ))
}
m<-is.na(x[[1]])
i<-i+1
}
return(x[[1]])
}
Then you can do
那你可以做
Y_all <- coalesce(Y_1,Y_2,Y_3,Y_4,Y_5)
Of course, this is very specific to getting the first non-NA value.
当然,这非常特定于获得第一个非NA值。
#2
2
I think in this case you can use the melt function to convert the data to long format and then get rid of the missing values:
我想在这种情况下你可以使用融合函数将数据转换为长格式,然后摆脱缺失的值:
library(reshape2)
set.seed(10)
N <- 1000
group <- sample(1:5, N, replace=TRUE)
Y_1 <- ifelse(group==1, rbinom(sum(group==1), 1, .5), NA)
Y_2 <- ifelse(group==2, rbinom(sum(group==2), 1, .5), NA)
Y_3 <- ifelse(group==3, rbinom(sum(group==3), 1, .5), NA)
Y_4 <- ifelse(group==4, rbinom(sum(group==4), 1, .5), NA)
Y_5 <- ifelse(group==5, rbinom(sum(group==5), 1, .5), NA)
Y_all = data.frame(group, Y_1, Y_2,Y_3,Y_4,Y_5)
Y_all.m = melt(Y_all, id.var="group")
Y_all.m = Y_all.m[!is.na(Y_all.m$value),]
#3
1
Store the vectors in a matrix and then select:
将矢量存储在矩阵中,然后选择:
Ymat <- cbind(Y_1,Y_2,Y_3,Y_4,Y_5)
mycol <- apply(!is.na(Ymat),1,which)
Y_all.f <- Ymat[cbind(1:nrow(Ymat),mycol)]
identical(Y_all,Y_all.f) # TRUE
#1
5
I like to use a coalesce()
function for this
我喜欢使用coalesce()函数
#available from https://gist.github.com/MrFlick/10205794
coalesce<-function(...) {
x<-lapply(list(...), function(z) {if (is.factor(z)) as.character(z) else z})
m<-is.na(x[[1]])
i<-2
while(any(m) & i<=length(x)) {
if ( length(x[[i]])==length(x[[1]])) {
x[[1]][m]<-x[[i]][m]
} else if (length(x[[i]])==1) {
x[[1]][m]<-x[[i]]
} else {
stop(paste("length mismatch in argument",i," - found:", length( x[[i]] ),"expected:",length( x[[1]] ) ))
}
m<-is.na(x[[1]])
i<-i+1
}
return(x[[1]])
}
Then you can do
那你可以做
Y_all <- coalesce(Y_1,Y_2,Y_3,Y_4,Y_5)
Of course, this is very specific to getting the first non-NA value.
当然,这非常特定于获得第一个非NA值。
#2
2
I think in this case you can use the melt function to convert the data to long format and then get rid of the missing values:
我想在这种情况下你可以使用融合函数将数据转换为长格式,然后摆脱缺失的值:
library(reshape2)
set.seed(10)
N <- 1000
group <- sample(1:5, N, replace=TRUE)
Y_1 <- ifelse(group==1, rbinom(sum(group==1), 1, .5), NA)
Y_2 <- ifelse(group==2, rbinom(sum(group==2), 1, .5), NA)
Y_3 <- ifelse(group==3, rbinom(sum(group==3), 1, .5), NA)
Y_4 <- ifelse(group==4, rbinom(sum(group==4), 1, .5), NA)
Y_5 <- ifelse(group==5, rbinom(sum(group==5), 1, .5), NA)
Y_all = data.frame(group, Y_1, Y_2,Y_3,Y_4,Y_5)
Y_all.m = melt(Y_all, id.var="group")
Y_all.m = Y_all.m[!is.na(Y_all.m$value),]
#3
1
Store the vectors in a matrix and then select:
将矢量存储在矩阵中,然后选择:
Ymat <- cbind(Y_1,Y_2,Y_3,Y_4,Y_5)
mycol <- apply(!is.na(Ymat),1,which)
Y_all.f <- Ymat[cbind(1:nrow(Ymat),mycol)]
identical(Y_all,Y_all.f) # TRUE