I have, for example, this three datasets (in my case, they are many more and with a lot of variables):
例如,我有三个数据集(在我的例子中,它们更多,并且有很多变量):
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame2 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
on each data frame I want to add a variable resulting from a transformation of an existing variable on that data frame. I would to do this by a loop. For example:
在每个数据框架上,我想添加一个变量,该变量是由该数据框架上现有变量的变换产生的。我想通过循环来做到这一点。例如:
datasets <- c("data_frame1","data_frame2","data_frame3")
vars <- c("a","b","c")
for (i in datasets){
for (j in vars){
# here I need a code that create a new variable with transformed values
# I thought this would work, but it didn't...
get(i)$new_var <- log(get(i)[,j])
}
}
Do you have some valid suggestions about that?
你有一些有效的建议吗?
Moreover, it would be great for me if it were possible also to assign the new column names (in this case new_var
) by a character string, so I could create the new variables by another for loop nested in the other two.
此外,如果有可能通过字符串分配新的列名(在本例中为new_var),对我来说也是很好的,所以我可以用另一个嵌套在另外两个中的for循环创建新变量。
Hope I've not been too tangled in explain my problem.
希望我没有太纠结于解释我的问题。
Thanks in advance.
提前致谢。
1 个解决方案
#1
5
You can put your dataframes in a list and use lapply
to process them one by one. So no need to use a loop in this case.
您可以将数据帧放在列表中,然后使用lapply逐个处理它们。因此在这种情况下不需要使用循环。
For example you can do this :
例如,你可以这样做:
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame3 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
ll <- list(data_frame1,data_frame2,data_frame3)
lapply(ll,function(df){
df$log_a <- log(df$a) ## new column with the log a
df$tans_col <- df$a+df$b+df$c ## new column with sums of some columns or any other
## transformation
### .....
df
})
the dataframe1 becomes :
dataframe1变为:
[[1]]
a b c log_a tans_col
1 1 3 4 0.0000000 8
2 5 6 4 1.6094379 15
3 3 1 1 1.0986123 5
4 3 5 9 1.0986123 17
5 2 5 2 0.6931472 9
#1
5
You can put your dataframes in a list and use lapply
to process them one by one. So no need to use a loop in this case.
您可以将数据帧放在列表中,然后使用lapply逐个处理它们。因此在这种情况下不需要使用循环。
For example you can do this :
例如,你可以这样做:
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame3 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
ll <- list(data_frame1,data_frame2,data_frame3)
lapply(ll,function(df){
df$log_a <- log(df$a) ## new column with the log a
df$tans_col <- df$a+df$b+df$c ## new column with sums of some columns or any other
## transformation
### .....
df
})
the dataframe1 becomes :
dataframe1变为:
[[1]]
a b c log_a tans_col
1 1 3 4 0.0000000 8
2 5 6 4 1.6094379 15
3 3 1 1 1.0986123 5
4 3 5 9 1.0986123 17
5 2 5 2 0.6931472 9