合并列表中的data.frames:如何选择多个元素

时间:2021-04-03 22:55:02

I have a list of dataframes with a different number of rows that I want to merge. There is a lovely solution for merging multiple dataframes that I use and works:

我有一个数据帧列表,其中包含我想要合并的不同行数。有一个很好的解决方案可以合并我使用和工作的多个数据帧:

> go.sigtop.l[c(1:3)]
$SRSF1_cyto
                                                   GoTerm       PValue Fold.Enrichment
1                                   lipid kinase activity 0.0044501957        5.378668
2 general RNA polymerase II transcription factor activity 0.0070975052        4.840801
3                      protein methyltransferase activity 0.0022675162        4.302935
4                            N-methyltransferase activity 0.0089131138        3.850638
5                          structure-specific DNA binding 0.0002666942        3.821685
6                  purine NTP-dependent helicase activity 0.0007861753        3.377303

$SRSF1_total
                                             GoTerm       PValue Fold.Enrichment
1 translation factor activity, nucleic acid binding 1.460691e-04        6.953428
2                structural constituent of ribosome 8.530549e-03        3.948718
3                                       RNA binding 3.479534e-09        3.675900
4                                nucleotide binding 9.800564e-04        1.638817

$SRSF2_cyto
                                       GoTerm      PValue Fold.Enrichment
1 protein-lysine N-methyltransferase activity 0.001722436       16.486352
2         lysine N-methyltransferase activity 0.001722436       16.486352
3 histone-lysine N-methyltransferase activity 0.001722436       16.486352
4          histone methyltransferase activity 0.003756630       12.607211
5                N-methyltransferase activity 0.007775608        9.741935
6          protein methyltransferase activity 0.008275521        9.525448

> merge.all <- function(by, ...) {
+   frames <- list(...)
+   df <- Reduce(function(x, y) { merge(x, y, by = by, all = TRUE) }, frames)
+   names(df) <- c(by, paste("V", seq(length(frames)), sep = ""))
+   
+   return(df)
+ }
> go.df <- merge.all("GoTerm", go.sigtop.l[[1]], go.sigtop.l[[2]], go.sigtop.l[[3]])
> go.df
                                                    GoTerm           V1       V2           V3       NA          NA        NA
1  general RNA polymerase II transcription factor activity 0.0070975052 4.840801           NA       NA          NA        NA
2              histone-lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
3                       histone methyltransferase activity           NA       NA           NA       NA 0.003756630 12.607211
4                                    lipid kinase activity 0.0044501957 5.378668           NA       NA          NA        NA
5                      lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
6                             N-methyltransferase activity 0.0089131138 3.850638           NA       NA 0.007775608  9.741935
7                                       nucleotide binding           NA       NA 9.800564e-04 1.638817          NA        NA
8              protein-lysine N-methyltransferase activity           NA       NA           NA       NA 0.001722436 16.486352
9                       protein methyltransferase activity 0.0022675162 4.302935           NA       NA 0.008275521  9.525448
10                  purine NTP-dependent helicase activity 0.0007861753 3.377303           NA       NA          NA        NA
11                                             RNA binding           NA       NA 3.479534e-09 3.675900          NA        NA
12                      structural constituent of ribosome           NA       NA 8.530549e-03 3.948718          NA        NA
13                          structure-specific DNA binding 0.0002666942 3.821685           NA       NA          NA        NA
14       translation factor activity, nucleic acid binding           NA       NA 1.460691e-04 6.953428          NA        NA

but the issue is the number of dataframes in the list will vary. How call all the elements automatically irrespective to the number contained in the list? I've tried:

但问题是列表中的数据帧数量会有所不同。如何自动调用所有元素而不考虑列表中包含的数字?我试过了:

merge.all("GoTerm", go.sigtop.l[c(1:length(names(go.sigtop.l)))]) 

but that did not work.

但那没用。

I am aware of many answers to similar questions but non of those that I've seen solve my problem. Cheers.

我知道类似问题的许多答案,但不是我见过的那些解决了我的问题。干杯。

2 个解决方案

#1


1  

This is not pretty but can be done with a for loop. If any better solution comes along I'll accept it instead of this:

这不是很好,但可以用for循环完成。如果有更好的解决方案,我会接受它而不是这个:

df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0 

> head(df.m)
                                                   GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1                          aminoacyl-tRNA ligase activity       0.000000000                   0.000000                  0                           0                 0
2                                    beta-catenin binding       0.000000000                   0.000000                  0                           0                 0
3                          cell adhesion molecule binding       0.000000000                   0.000000                  0                           0                 0
4                           cytochrome-c oxidase activity       0.000000000                   0.000000                  0                           0                 0
5                            cytoskeletal protein binding       0.000000000                   0.000000                  0                           0                 0
6 general RNA polymerase II transcription factor activity       0.007097505                   4.840801                  0                           0                 0
  Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1                          0                  0                           0       0.000000000                   0.000000                  0                           0
2                          0                  0                           0       0.000186408                   5.037574                  0                           0
3                          0                  0                           0       0.000000000                   0.000000                  0                           0
4                          0                  0                           0       0.000000000                   0.000000                  0                           0
5                          0                  0                           0       0.000000000                   0.000000                  0                           0
6                          0                  0                           0       0.000000000                   0.000000                  0                           0
  PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
2         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
3         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
4         0.0025874                   14.26516          0.0000000                    0.000000                 0                          0                  0
5         0.0000000                    0.00000          0.0053485                    4.239176                 0                          0                  0
6         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
  Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1                           0      0.0007474458                   12.03623                  0                           0                 0                          0
2                           0      0.0000000000                    0.00000                  0                           0                 0                          0
3                           0      0.0000000000                    0.00000                  0                           0                 0                          0
4                           0      0.0000000000                    0.00000                  0                           0                 0                          0
5                           0      0.0000000000                    0.00000                  0                           0                 0                          0
6                           0      0.0000000000                    0.00000                  0                           0                 0                          0
  PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1        0.000000000                     0.00000
2        0.000000000                     0.00000
3        0.009078473                    20.42213
4        0.000000000                     0.00000
5        0.000000000                     0.00000
6        0.000000000                     0.00000

#2


0  

Have you tried this function :

你试过这个功能:

http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html

http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html

See this code in the link:

请在链接中查看此代码:

df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2

out <- smartbind( list(df1, df2, df3, df4))

In our case

在我们的例子中

out <- smartbind(go.sigtop.l)

EDITED response.

已编辑回复。

#1


1  

This is not pretty but can be done with a for loop. If any better solution comes along I'll accept it instead of this:

这不是很好,但可以用for循环完成。如果有更好的解决方案,我会接受它而不是这个:

df.m <- go.sigtop.l[[1]]
for (i in 2:length(names(go.sigtop.l))){
df.m <- merge(df.m, go.sigtop.l[[i]], by ="GoTerm", all = TRUE, suffixes = c(paste(".", names(go.sigtop.l)[i-1], sep=""), paste(".", names(go.sigtop.l)[i], sep="")))
}
df.m[is.na(df.m)] <- 0 

> head(df.m)
                                                   GoTerm PValue.SRSF1_cyto Fold.Enrichment.SRSF1_cyto PValue.SRSF1_total Fold.Enrichment.SRSF1_total PValue.SRSF2_cyto
1                          aminoacyl-tRNA ligase activity       0.000000000                   0.000000                  0                           0                 0
2                                    beta-catenin binding       0.000000000                   0.000000                  0                           0                 0
3                          cell adhesion molecule binding       0.000000000                   0.000000                  0                           0                 0
4                           cytochrome-c oxidase activity       0.000000000                   0.000000                  0                           0                 0
5                            cytoskeletal protein binding       0.000000000                   0.000000                  0                           0                 0
6 general RNA polymerase II transcription factor activity       0.007097505                   4.840801                  0                           0                 0
  Fold.Enrichment.SRSF2_cyto PValue.SRSF2_total Fold.Enrichment.SRSF2_total PValue.SRSF3_cyto Fold.Enrichment.SRSF3_cyto PValue.SRSF3_total Fold.Enrichment.SRSF3_total
1                          0                  0                           0       0.000000000                   0.000000                  0                           0
2                          0                  0                           0       0.000186408                   5.037574                  0                           0
3                          0                  0                           0       0.000000000                   0.000000                  0                           0
4                          0                  0                           0       0.000000000                   0.000000                  0                           0
5                          0                  0                           0       0.000000000                   0.000000                  0                           0
6                          0                  0                           0       0.000000000                   0.000000                  0                           0
  PValue.SRSF4_cyto Fold.Enrichment.SRSF4_cyto PValue.SRSF4_total Fold.Enrichment.SRSF4_total PValue.SRSF5_cyto Fold.Enrichment.SRSF5_cyto PValue.SRSF5_total
1         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
2         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
3         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
4         0.0025874                   14.26516          0.0000000                    0.000000                 0                          0                  0
5         0.0000000                    0.00000          0.0053485                    4.239176                 0                          0                  0
6         0.0000000                    0.00000          0.0000000                    0.000000                 0                          0                  0
  Fold.Enrichment.SRSF5_total PValue.SRSF6_cyto Fold.Enrichment.SRSF6_cyto PValue.SRSF6_total Fold.Enrichment.SRSF6_total PValue.SRSF7_cyto Fold.Enrichment.SRSF7_cyto
1                           0      0.0007474458                   12.03623                  0                           0                 0                          0
2                           0      0.0000000000                    0.00000                  0                           0                 0                          0
3                           0      0.0000000000                    0.00000                  0                           0                 0                          0
4                           0      0.0000000000                    0.00000                  0                           0                 0                          0
5                           0      0.0000000000                    0.00000                  0                           0                 0                          0
6                           0      0.0000000000                    0.00000                  0                           0                 0                          0
  PValue.SRSF7_total Fold.Enrichment.SRSF7_total
1        0.000000000                     0.00000
2        0.000000000                     0.00000
3        0.009078473                    20.42213
4        0.000000000                     0.00000
5        0.000000000                     0.00000
6        0.000000000                     0.00000

#2


0  

Have you tried this function :

你试过这个功能:

http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html

http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html

See this code in the link:

请在链接中查看此代码:

df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1
df4 <- df2

out <- smartbind( list(df1, df2, df3, df4))

In our case

在我们的例子中

out <- smartbind(go.sigtop.l)

EDITED response.

已编辑回复。