平均R数据帧中的重复值

时间:2021-09-24 12:58:30

I have a df named ColorMap in which I am looking to average all numerical values corresponding to the same feature (further explanation below). Here is the df.

我有一个名为ColorMap的df,我在其中寻找对应于相同特征的所有数值的平均值(下面的进一步说明)。这是df。

> ColorMap
    KEGGnumber Colors
1   c("C00489"  0.162
2     "C06104"  0.162
3    "C02656")  0.162
4       C00163 -0.173
5   c("C02656" -0.140
6     "C00036" -0.140
7     "C00232" -0.140
8     "C01571" -0.140
9    "C00422") -0.140
10  c("C00402"  0.147
11    "C06664"  0.147
12    "C06687"  0.147
13   "C02059")  0.147
14  c("C00246"  0.069
15   "C00902")  0.069
**16      C00033  0.011
...
25      C00033 -0.073**
26      C00048  0.259
**27  c("C00803"  0.063
...
37      C00803 -0.200
38      C00803 -0.170**
39  c("C00164" -0.020
40    "C01712" -0.020
...
165 c("C00246"  0.076
166  "C00902")  0.076
**167     C00163 -0.063
...
169     C00163  0.046**
170 c("C00058" -0.208
171  "C00036") -0.208
172     C00121 -0.178
173     C00033 -0.193
174     C00163 -0.085

I would like the final to look something like this

我希望决赛看起来像这样

> ColorMap
    KEGGnumber Colors
1      C00489   0.162
2      C06104   0.162
3      C02656   0.162
4      C00163  -0.173
5      C02656  -0.140
6      C00036  -0.140
7      C00232  -0.140
8      C01571  -0.140
9      C00422  -0.140
10     C00402   0.147
11     C06664   0.147
12     C06687   0.147
13     C02059   0.147
14     C00246   0.069
15     C00902   0.069
**16   C00033   0.031**
26     C00048   0.259
**27   C00803  -0.100**
39     C00164  -0.020
40     C01712  -0.020
...
165    C00246   0.076
166    C00902   0.076
**167  C00163   0.0085**
170    C00058  -0.208
171    C00036  -0.208
172    C00121  -0.178
173    C00033  -0.193
174    C00163  -0.085

They do not need to be next to each other, I simply chose those for easy visualization. I would like the mean of all Colors to a single KEGGvalue. Thus, each KEGGvalue is unique, there are no duplicates.

它们不需要彼此相邻,我只需选择那些便于可视化。我希望所有颜色的均值为单个KEGG值。因此,每个KEGG值都是唯一的,没有重复。

1 个解决方案

#1


1  

You can clean that column using

您可以使用清理该列

library(stringr)
ColorMap$KEGGnumber <- str_extract(ColorMap$KEGGnumber, "[C][0-9]+")

The argument pattern allows you to match with a regular expression, in this case, a simple one, telling you to match the capital letter C followed by any number of numbers.

参数模式允许您匹配正则表达式,在本例中是一个简单表达式,告诉您匹配大写字母C后跟任意数量的数字。

Afterwards, grouping using dplyr we have

然后,我们使用dplyr进行分组

library(dplyr)
ColorMap %>% group_by(KEGGnumber) %>% summarize(mean(Colors))

#1


1  

You can clean that column using

您可以使用清理该列

library(stringr)
ColorMap$KEGGnumber <- str_extract(ColorMap$KEGGnumber, "[C][0-9]+")

The argument pattern allows you to match with a regular expression, in this case, a simple one, telling you to match the capital letter C followed by any number of numbers.

参数模式允许您匹配正则表达式,在本例中是一个简单表达式,告诉您匹配大写字母C后跟任意数量的数字。

Afterwards, grouping using dplyr we have

然后,我们使用dplyr进行分组

library(dplyr)
ColorMap %>% group_by(KEGGnumber) %>% summarize(mean(Colors))