使用dataframe列的字符串和相应的值查找唯一的行

时间:2022-04-22 19:36:01

I have a dataframe:

我有一个dataframe:

gene=c("Esr", "Esr", "Esr", "Nop", "Nop", "Nop", "Stu", "Mkp", "Mkp", "P53", "Ard", "Ard")
int_1=c(34,56,544,566,123,00,343,56,22,10,11,19)
int_2=c(24,26,58,56,13,00,34,6,22,10,119,109)
int_3=c(14,36,54,566,12,00,43,56,00,770,11,119)
df1 = cbind.data.frame(gene, int_1, int_2, int_3)
  1. df1 is 26000 rows long and 36 columns wide.
  2. df1有26000行长,36列宽。
  3. I want to make a new df2, where column "gene" is looked for unique strings/text and all values in the rows are summed together for corresponding individual intensity columns.
  4. 我想做一个新的df2,在其中查找列“gene”,查找唯一的字符串/文本,并将行中的所有值相加,以得到相应的单个强度列。
  5. In df1 the gene names appear multiple times. The df2 will have each gene only once.
  6. 在df1中,基因名出现了多次。df2只会有每个基因一次。

I am trying to use tidyverse packages so a solution using those will be very much appreciated (if possible). Thank you so much.

我正在尝试使用tidyverse包,因此使用这些包的解决方案将非常感谢(如果可能的话)。非常感谢。

1 个解决方案

#1


3  

We can use dplyr::summarise_all

我们可以使用dplyr::summarise_all

(1) to average values

(1)平均值

library(tidyverse)
df2 <- df1 %>%
    group_by(gene) %>%
    summarise_all(mean)
df2;
## A tibble: 6 x 4
#  gene  int_1 int_2 int_3
#  <fct> <dbl> <dbl> <dbl>
#1 Ard    15.0  114.  65.0
#2 Esr   211.    36.  34.7
#3 Mkp    39.0   14.  28.0
#4 Nop   230.    23. 193.
#5 P53    10.0   10. 770.
#6 Stu   343.    34.  43.0

(2) to sum values

(2)和值

df2 <- df1 %>%
    group_by(gene) %>%
    summarise_all(sum)
df2;
## A tibble: 6 x 4
#  gene  int_1 int_2 int_3
#  <fct> <dbl> <dbl> <dbl>
#1 Ard     30.  228.  130.
#2 Esr    634.  108.  104.
#3 Mkp     78.   28.   56.
#4 Nop    689.   69.  578.
#5 P53     10.   10.  770.
#6 Stu    343.   34.   43.

Or in base R you can use aggregate

或者在底数R中,你可以使用聚合

aggregate(cbind(int_1, int_2, int_3) ~ gene, data = df1, sum)
#  gene int_1 int_2 int_3
#1  Ard    30   228   130
#2  Esr   634   108   104
#3  Mkp    78    28    56
#4  Nop   689    69   578
#5  P53    10    10   770
#6  Stu   343    34    43

#1


3  

We can use dplyr::summarise_all

我们可以使用dplyr::summarise_all

(1) to average values

(1)平均值

library(tidyverse)
df2 <- df1 %>%
    group_by(gene) %>%
    summarise_all(mean)
df2;
## A tibble: 6 x 4
#  gene  int_1 int_2 int_3
#  <fct> <dbl> <dbl> <dbl>
#1 Ard    15.0  114.  65.0
#2 Esr   211.    36.  34.7
#3 Mkp    39.0   14.  28.0
#4 Nop   230.    23. 193.
#5 P53    10.0   10. 770.
#6 Stu   343.    34.  43.0

(2) to sum values

(2)和值

df2 <- df1 %>%
    group_by(gene) %>%
    summarise_all(sum)
df2;
## A tibble: 6 x 4
#  gene  int_1 int_2 int_3
#  <fct> <dbl> <dbl> <dbl>
#1 Ard     30.  228.  130.
#2 Esr    634.  108.  104.
#3 Mkp     78.   28.   56.
#4 Nop    689.   69.  578.
#5 P53     10.   10.  770.
#6 Stu    343.   34.   43.

Or in base R you can use aggregate

或者在底数R中,你可以使用聚合

aggregate(cbind(int_1, int_2, int_3) ~ gene, data = df1, sum)
#  gene int_1 int_2 int_3
#1  Ard    30   228   130
#2  Esr   634   108   104
#3  Mkp    78    28    56
#4  Nop   689    69   578
#5  P53    10    10   770
#6  Stu   343    34    43