I have over 20 twenty data.frames with the same columns but differing amount of rows. My goal is to merge the data.frames by the column "Name" (which is a list of five names) and while merging I would like the rows with the same name to sum column A, sum column B, and get the mean of column C.
我有超过20个20个数据。框架是相同的列,但行数不同。我的目标是通过列“Name”(这是五个名称的列表)合并data.frame,并且在合并时,我想要将相同名称的行合并到sum列a、sum列B中,并得到C列的平均值。
Here is what I am currently doing.
这是我目前正在做的事情。
First I will just merge 2 data.frames at a time.
首先,我将一次合并两个数据。
DF <- merge(x=abc, y=def, by = "Name", all = T)
Merged DF will look like such
合并后的DF看起来是这样的。
Name A.x B.x C.x A.y B.y C.y
name1,name2,name3,name4,name5 11 24 7 NA NA NA
name1,name3,name4,name6,name7 4 8 12 3 4 7
name1,name2,name5,name6,name7 12 4 5 NA NA NA
name3,name4,name5,name6,name7 NA NA NA 15 3 28
I will then add these ifelse
statements to deal with the NAs
and non unique rows. For the non unique rows it will add for A add for B and for C it will get an average.
然后,我将添加这些ifelse语句来处理NAs和非唯一行。对于非唯一行,它将为B和C添加一个add,它将得到一个平均值。
DF$A <- ifelse(is.na(DF$A.x), DF$A.y,
ifelse(is.na(DF$A.y), DF$A.x,
ifelse((!is.na(DF$A.x)) & (!is.na(DF$A.y)), DF$A.x + DF$A.y, 1)))
DF$B <- ifelse(is.na(DF$B.x), DF$B.y,
ifelse(is.na(DF$B.y), DF$B.x,
ifelse((!is.na(DF$B.x)) & (!is.na(DF$B.y)), DF$B.x + DF$B.y, 1)))
DF$C <- ifelse(is.na(DF$C.x), DF$C.y,
ifelse(is.na(DF$C.y), DF$C.x,
ifelse((!is.na(DF$C.x)) & (!is.na(DF$C.y)), (DF$C.x + DF$C.y)/2, 1)))
DF will now look like such
DF现在看起来像这样。
Name A.x B.x C.x A.y B.y C.y A B C
name1,name2,name3,name4,name5 11 24 7 NA NA NA 11 24 7
name1,name3,name4,name6,name7 4 8 12 3 4 8 7 12 10
name1,name2,name5,name6,name7 12 4 5 NA NA NA 12 4 5
name3,name4,name5,name6,name7 NA NA NA 15 3 28 15 3 28
I then keep just the Name column and the last three columns
然后只保留Name列和最后三列。
merge1 <- DF[c(1,8,9,10)]
Then I do the same process for the next two data.frames and call it merge2. Then I will merge merge1 and merge 2.
然后我对接下来的两个数据进行相同的处理,并将其命名为merge2。然后合并merge1和merge 2。
total1 <- merge(x = merge1, y = merge2, by = "Name", all = TRUE)
I will just continue to merge two data frames at a time then merge the Totals data.frames together as well two at a time. I get my end result that I want but it is a timely process and not very efficient.
我将继续合并两个数据帧,然后合并所有的数据。我得到了我想要的结果,但这是一个及时的过程,不是很有效率。
Another way I think I could do it is may be do a rbind with all the data.frames then if in the Name column any row has the same list of names as another row then make that one row, add column A, add column B and get the mean of column C. But I am not sure how to do that as well.
我认为我可以做另一种方式是可以做一个rbind Name列中的所有data.frames如果任何行有相同的名单另一行做一行,添加列,添加列B和得到的均值列c。但我不确定该怎么做。
Here is an example of what I would like with rind
这里有一个我想用rind做的例子。
Name A B C
name1,name2,name3,name4,name5 11 24 7
name1,name3,name4,name6,name7 4 8 12
name1,name2,name5,name6,name7 12 4 5
name3,name4,name5,name6,name7 15 3 28
name1,name3,name4,name6,name7 3 4 8
The end result would look like such
最终结果将是这样的。
Name A B C
name1,name2,name3,name4,name5 11 24 7
name1,name3,name4,name6,name7 7 12 10
name1,name2,name5,name6,name7 12 4 5
name3,name4,name5,name6,name7 15 3 28
Again, I am sure there are more efficient ways to complete what I want than what I am currently doing so any help would be greatly appreciated.
再一次,我确信有更有效的方法来完成我想要的,而不是我目前正在做的,所以任何帮助都是非常感谢的。
2 个解决方案
#1
3
I think your second approach is the way to go, and you can do that with data.table
or dplyr
.
我认为你的第二种方法是可行的,你可以用数据来做。表或dplyr。
Here a few steps using data.table
. First, if your data frames are abc
, def
, ... do:
这里有一些使用data.table的步骤。首先,如果你的数据帧是abc, def,…做的事:
DF <- do.call(rbind, list(abc,def,...))
now you can transform them into a data.table
现在可以将它们转换为数据表。
DT <- data.table(DF)
and simply do something like
做一些简单的事情。
DTres <- DT[,.(A=sum(A, na.rm=T), B=sum(B, na.rm=T), C=mean(C,na.rm=T)),by=name]
double check the data.table
vignettes to get a better idea how that package work.
仔细检查数据。为了更好地了解这个包是如何工作的,表vignettes。
#2
0
We can use dplyr
我们可以使用dplyr
library(dplyr)
bind_rows(abc, def, ...) %>%
group_by(name) %>%
summarise(A= sum(A, na.rm= TRUE),
B = sum(B, na.rm= TRUE),
C = mean(C, na.rm=TRUE))
#1
3
I think your second approach is the way to go, and you can do that with data.table
or dplyr
.
我认为你的第二种方法是可行的,你可以用数据来做。表或dplyr。
Here a few steps using data.table
. First, if your data frames are abc
, def
, ... do:
这里有一些使用data.table的步骤。首先,如果你的数据帧是abc, def,…做的事:
DF <- do.call(rbind, list(abc,def,...))
now you can transform them into a data.table
现在可以将它们转换为数据表。
DT <- data.table(DF)
and simply do something like
做一些简单的事情。
DTres <- DT[,.(A=sum(A, na.rm=T), B=sum(B, na.rm=T), C=mean(C,na.rm=T)),by=name]
double check the data.table
vignettes to get a better idea how that package work.
仔细检查数据。为了更好地了解这个包是如何工作的,表vignettes。
#2
0
We can use dplyr
我们可以使用dplyr
library(dplyr)
bind_rows(abc, def, ...) %>%
group_by(name) %>%
summarise(A= sum(A, na.rm= TRUE),
B = sum(B, na.rm= TRUE),
C = mean(C, na.rm=TRUE))