I have a list of dataframes with a different number of rows that I want to merge. There is a lovely solution for merging multiple dataframes that I use and works:
我有一个数据帧列表,其中包含我想要合并的不同行数。有一个很好的解决方案可以合并我使用和工作的多个数据帧:
> go.sigtop.l[c(1:3)]
$SRSF1_cyto
GoTerm PValue Fold.Enrichment
1 lipid kinase activity 0.0044501957 5.378668
2 general RNA polymerase II transcription factor activity 0.0070975052 4.840801
3 protein methyltransferase activity 0.0022675162 4.302935
4 N-methyltransferase activity 0.0089131138 3.850638
5 structure-specific DNA binding 0.0002666942 3.821685
6 purine NTP-dependent helicase activity 0.0007861753 3.377303
$SRSF1_total
GoTerm PValue Fold.Enrichment
1 translation factor activity, nucleic acid binding 1.460691e-04 6.953428
2 structural constituent of ribosome 8.530549e-03 3.948718
3 RNA binding 3.479534e-09 3.675900
4 nucleotide binding 9.800564e-04 1.638817
$SRSF2_cyto
GoTerm PValue Fold.Enrichment
1 protein-lysine N-methyltransferase activity 0.001722436 16.486352
2 lysine N-methyltransferase activity 0.001722436 16.486352
3 histone-lysine N-methyltransferase activity 0.001722436 16.486352
4 histone methyltransferase activity 0.003756630 12.607211
5 N-methyltransferase activity 0.007775608 9.741935
6 protein methyltransferase activity 0.008275521 9.525448
> merge.all <- function(by, ...) {
+ frames <- list(...)
+ df <- Reduce(function(x, y) { merge(x, y, by = by, all = TRUE) }, frames)
+ names(df) <- c(by, paste("V", seq(length(frames)), sep = ""))
+
+ return(df)
+ }
> go.df <- merge.all("GoTerm", go.sigtop.l[[1]], go.sigtop.l[[2]], go.sigtop.l[[3]])
> go.df
GoTerm V1 V2 V3 NA NA NA
1 general RNA polymerase II transcription factor activity 0.0070975052 4.840801 NA NA NA NA
2 histone-lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
3 histone methyltransferase activity NA NA NA NA 0.003756630 12.607211
4 lipid kinase activity 0.0044501957 5.378668 NA NA NA NA
5 lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
6 N-methyltransferase activity 0.0089131138 3.850638 NA NA 0.007775608 9.741935
7 nucleotide binding NA NA 9.800564e-04 1.638817 NA NA
8 protein-lysine N-methyltransferase activity NA NA NA NA 0.001722436 16.486352
9 protein methyltransferase activity 0.0022675162 4.302935 NA NA 0.008275521 9.525448
10 purine NTP-dependent helicase activity 0.0007861753 3.377303 NA NA NA NA
11 RNA binding NA NA 3.479534e-09 3.675900 NA NA
12 structural constituent of ribosome NA NA 8.530549e-03 3.948718 NA NA
13 structure-specific DNA binding 0.0002666942 3.821685 NA NA NA NA
14 translation factor activity, nucleic acid binding NA NA 1.460691e-04 6.953428 NA NA
but the issue is the number of dataframes in the list will vary. How call all the elements automatically irrespective to the number contained in the list? I've tried:
但问题是列表中的数据帧数量会有所不同。如何自动调用所有元素而不考虑列表中包含的数字?我试过了:
merge.all("GoTerm", go.sigtop.l[c(1:length(names(go.sigtop.l)))])
but that did not work.
但那没用。
I am aware of many answers to similar questions but non of those that I've seen solve my problem. Cheers.
我知道类似问题的许多答案,但不是我见过的那些解决了我的问题。干杯。
2 个解决方案
#1
1
This is not pretty but can be done with a for loop. If any better solution comes along I'll accept it instead of this:
这不是很好,但可以用for循环完成。如果有更好的解决方案,我会接受它而不是这个:
df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0
> head(df.m)
GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1 aminoacyl-tRNA ligase activity 0.000000000 0.000000 0 0 0
2 beta-catenin binding 0.000000000 0.000000 0 0 0
3 cell adhesion molecule binding 0.000000000 0.000000 0 0 0
4 cytochrome-c oxidase activity 0.000000000 0.000000 0 0 0
5 cytoskeletal protein binding 0.000000000 0.000000 0 0 0
6 general RNA polymerase II transcription factor activity 0.007097505 4.840801 0 0 0
Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1 0 0 0 0.000000000 0.000000 0 0
2 0 0 0 0.000186408 5.037574 0 0
3 0 0 0 0.000000000 0.000000 0 0
4 0 0 0 0.000000000 0.000000 0 0
5 0 0 0 0.000000000 0.000000 0 0
6 0 0 0 0.000000000 0.000000 0 0
PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1 0.0000000 0.00000 0.0000000 0.000000 0 0 0
2 0.0000000 0.00000 0.0000000 0.000000 0 0 0
3 0.0000000 0.00000 0.0000000 0.000000 0 0 0
4 0.0025874 14.26516 0.0000000 0.000000 0 0 0
5 0.0000000 0.00000 0.0053485 4.239176 0 0 0
6 0.0000000 0.00000 0.0000000 0.000000 0 0 0
Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1 0 0.0007474458 12.03623 0 0 0 0
2 0 0.0000000000 0.00000 0 0 0 0
3 0 0.0000000000 0.00000 0 0 0 0
4 0 0.0000000000 0.00000 0 0 0 0
5 0 0.0000000000 0.00000 0 0 0 0
6 0 0.0000000000 0.00000 0 0 0 0
PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1 0.000000000 0.00000
2 0.000000000 0.00000
3 0.009078473 20.42213
4 0.000000000 0.00000
5 0.000000000 0.00000
6 0.000000000 0.00000
#2
0
Have you tried this function :
你试过这个功能:
http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html
http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html
See this code in the link:
请在链接中查看此代码:
df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2
out <- smartbind( list(df1, df2, df3, df4))
In our case
在我们的例子中
out <- smartbind(go.sigtop.l)
EDITED response.
已编辑回复。
#1
1
This is not pretty but can be done with a for loop. If any better solution comes along I'll accept it instead of this:
这不是很好,但可以用for循环完成。如果有更好的解决方案,我会接受它而不是这个:
df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0
> head(df.m)
GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1 aminoacyl-tRNA ligase activity 0.000000000 0.000000 0 0 0
2 beta-catenin binding 0.000000000 0.000000 0 0 0
3 cell adhesion molecule binding 0.000000000 0.000000 0 0 0
4 cytochrome-c oxidase activity 0.000000000 0.000000 0 0 0
5 cytoskeletal protein binding 0.000000000 0.000000 0 0 0
6 general RNA polymerase II transcription factor activity 0.007097505 4.840801 0 0 0
Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1 0 0 0 0.000000000 0.000000 0 0
2 0 0 0 0.000186408 5.037574 0 0
3 0 0 0 0.000000000 0.000000 0 0
4 0 0 0 0.000000000 0.000000 0 0
5 0 0 0 0.000000000 0.000000 0 0
6 0 0 0 0.000000000 0.000000 0 0
PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1 0.0000000 0.00000 0.0000000 0.000000 0 0 0
2 0.0000000 0.00000 0.0000000 0.000000 0 0 0
3 0.0000000 0.00000 0.0000000 0.000000 0 0 0
4 0.0025874 14.26516 0.0000000 0.000000 0 0 0
5 0.0000000 0.00000 0.0053485 4.239176 0 0 0
6 0.0000000 0.00000 0.0000000 0.000000 0 0 0
Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1 0 0.0007474458 12.03623 0 0 0 0
2 0 0.0000000000 0.00000 0 0 0 0
3 0 0.0000000000 0.00000 0 0 0 0
4 0 0.0000000000 0.00000 0 0 0 0
5 0 0.0000000000 0.00000 0 0 0 0
6 0 0.0000000000 0.00000 0 0 0 0
PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1 0.000000000 0.00000
2 0.000000000 0.00000
3 0.009078473 20.42213
4 0.000000000 0.00000
5 0.000000000 0.00000
6 0.000000000 0.00000
#2
0
Have you tried this function :
你试过这个功能:
http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html
http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html
See this code in the link:
请在链接中查看此代码:
df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2
out <- smartbind( list(df1, df2, df3, df4))
In our case
在我们的例子中
out <- smartbind(go.sigtop.l)
EDITED response.
已编辑回复。