I'm new to R and I'm trying to sum 2 columns of a given dataframe, if both the elements to be summed satisfy a given condition. To make things clear, what I want to do is:
我是R的新手,我试着求和一个给定的dataframe的两列,如果要求和的两个元素都满足给定的条件。为了把事情弄清楚,我想做的是:
> t.d<-as.data.frame(matrix(1:9,ncol=3))
> t.d
V1 V2 V3
1 4 7
2 5 8
3 6 9
> t.d$V4<-rep(0,nrow(t.d))
> for (i in 1:nrow(t.d)){
+ if (t.d$V1[i]>1 && t.d$V3[i]<9){
+ t.d$V4[i]<-t.d$V1[i]+t.d$V3[i]}
+ }
> t.d
V1 V2 V3 V4
1 4 7 0
2 5 8 10
3 6 9 0
I need an efficient code, as my real dataframe has about 150000 rows and 200 columns. This gives an error:
我需要一个高效的代码,因为我真正的dataframe有大约15万行和200列。这给了一个错误:
t.d$V4<-t.d$V1[t.d$V1>1]+ t.d$V3[t.d$V3>9]
Is "apply" an option? I tried this:
“应用”是一个选择吗?我试着这样的:
t.d<-as.data.frame(matrix(1:9,ncol=3))
t.d$V4<-rep(0,nrow(t.d))
my.fun<-function(x,y){
if(x>1 && y<9){
x+y}
}
t.d$V4<-apply(X=t.d,MAR=1,FUN=my.fun,x=t.d$V1,y=t.d$V3)
but it gives an error as well. Thanks very much for your help.
但它也会产生错误。非常感谢你的帮助。
3 个解决方案
#1
39
This operation doesn't require loops, apply statements or if statements. Vectorised operations and subsetting is all you need:
此操作不需要循环、应用语句或if语句。矢量化操作和子设置是您所需要的:
t.d <- within(t.d, V4 <- V1 + V3)
t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0
t.d
V1 V2 V3 V4
1 1 4 7 0
2 2 5 8 10
3 3 6 9 0
Why does this work?
为什么这个工作吗?
In the first step I create a new column that is the straight sum of columns V1 and V4. I use within
as a convenient way of referring to the columns of d.f
without having to write d.f$V
all the time.
在第一步中,我创建了一个新的列,该列是V1和V4的直接和。我用in来表示d的列。f,不用写d。f $ V。
In the second step I subset all of the rows that don't fulfill your conditions and set V4 for these to 0.
在第二步中,我将所有不满足条件的行集中在一起,将V4设置为0。
#2
25
ifelse
is your friend here:
如果你是这里的朋友:
t.d$V4<-ifelse((t.d$V1>1)&(t.d$V3<9), t.d$V1+ t.d$V3, 0)
#3
9
I'll chip in and provide yet another version. Since you want zero if the condition doesn't mach, and TRUE/FALSE are glorified versions of 1/0, simply multiplying by the condition also works:
我将插入并提供另一个版本。如果条件不马赫,你想要零,而真/假则是1/0的美化版本,简单地乘以条件也可以:
t.d<-as.data.frame(matrix(1:9,ncol=3))
t.d <- within(t.d, V4 <- (V1+V3)*(V1>1 & V3<9))
...and it happens to be faster than the other solutions ;-)
…而且它恰好比其他的解快;
t.d <- data.frame(V1=runif(2e7, 1, 2), V2=1:2e7, V3=runif(2e7, 5, 10))
system.time( within(t.d, V4 <- (V1+V3)*(V1>1 & V3<9)) ) # 3.06 seconds
system.time( ifelse((t.d$V1>1)&(t.d$V3<9), t.d$V1+ t.d$V3, 0) ) # 5.08 seconds
system.time( { t.d <- within(t.d, V4 <- V1 + V3);
t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0 } ) # 4.50 seconds
#1
39
This operation doesn't require loops, apply statements or if statements. Vectorised operations and subsetting is all you need:
此操作不需要循环、应用语句或if语句。矢量化操作和子设置是您所需要的:
t.d <- within(t.d, V4 <- V1 + V3)
t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0
t.d
V1 V2 V3 V4
1 1 4 7 0
2 2 5 8 10
3 3 6 9 0
Why does this work?
为什么这个工作吗?
In the first step I create a new column that is the straight sum of columns V1 and V4. I use within
as a convenient way of referring to the columns of d.f
without having to write d.f$V
all the time.
在第一步中,我创建了一个新的列,该列是V1和V4的直接和。我用in来表示d的列。f,不用写d。f $ V。
In the second step I subset all of the rows that don't fulfill your conditions and set V4 for these to 0.
在第二步中,我将所有不满足条件的行集中在一起,将V4设置为0。
#2
25
ifelse
is your friend here:
如果你是这里的朋友:
t.d$V4<-ifelse((t.d$V1>1)&(t.d$V3<9), t.d$V1+ t.d$V3, 0)
#3
9
I'll chip in and provide yet another version. Since you want zero if the condition doesn't mach, and TRUE/FALSE are glorified versions of 1/0, simply multiplying by the condition also works:
我将插入并提供另一个版本。如果条件不马赫,你想要零,而真/假则是1/0的美化版本,简单地乘以条件也可以:
t.d<-as.data.frame(matrix(1:9,ncol=3))
t.d <- within(t.d, V4 <- (V1+V3)*(V1>1 & V3<9))
...and it happens to be faster than the other solutions ;-)
…而且它恰好比其他的解快;
t.d <- data.frame(V1=runif(2e7, 1, 2), V2=1:2e7, V3=runif(2e7, 5, 10))
system.time( within(t.d, V4 <- (V1+V3)*(V1>1 & V3<9)) ) # 3.06 seconds
system.time( ifelse((t.d$V1>1)&(t.d$V3<9), t.d$V1+ t.d$V3, 0) ) # 5.08 seconds
system.time( { t.d <- within(t.d, V4 <- V1 + V3);
t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0 } ) # 4.50 seconds