I have a df named ColorMap
in which I am looking to average all numerical values corresponding to the same feature (further explanation below). Here is the df.
我有一个名为ColorMap的df,我在其中寻找对应于相同特征的所有数值的平均值(下面的进一步说明)。这是df。
> ColorMap
KEGGnumber Colors
1 c("C00489" 0.162
2 "C06104" 0.162
3 "C02656") 0.162
4 C00163 -0.173
5 c("C02656" -0.140
6 "C00036" -0.140
7 "C00232" -0.140
8 "C01571" -0.140
9 "C00422") -0.140
10 c("C00402" 0.147
11 "C06664" 0.147
12 "C06687" 0.147
13 "C02059") 0.147
14 c("C00246" 0.069
15 "C00902") 0.069
**16 C00033 0.011
...
25 C00033 -0.073**
26 C00048 0.259
**27 c("C00803" 0.063
...
37 C00803 -0.200
38 C00803 -0.170**
39 c("C00164" -0.020
40 "C01712" -0.020
...
165 c("C00246" 0.076
166 "C00902") 0.076
**167 C00163 -0.063
...
169 C00163 0.046**
170 c("C00058" -0.208
171 "C00036") -0.208
172 C00121 -0.178
173 C00033 -0.193
174 C00163 -0.085
I would like the final to look something like this
我希望决赛看起来像这样
> ColorMap
KEGGnumber Colors
1 C00489 0.162
2 C06104 0.162
3 C02656 0.162
4 C00163 -0.173
5 C02656 -0.140
6 C00036 -0.140
7 C00232 -0.140
8 C01571 -0.140
9 C00422 -0.140
10 C00402 0.147
11 C06664 0.147
12 C06687 0.147
13 C02059 0.147
14 C00246 0.069
15 C00902 0.069
**16 C00033 0.031**
26 C00048 0.259
**27 C00803 -0.100**
39 C00164 -0.020
40 C01712 -0.020
...
165 C00246 0.076
166 C00902 0.076
**167 C00163 0.0085**
170 C00058 -0.208
171 C00036 -0.208
172 C00121 -0.178
173 C00033 -0.193
174 C00163 -0.085
They do not need to be next to each other, I simply chose those for easy visualization. I would like the mean of all Colors
to a single KEGGvalue
. Thus, each KEGGvalue
is unique, there are no duplicates.
它们不需要彼此相邻,我只需选择那些便于可视化。我希望所有颜色的均值为单个KEGG值。因此,每个KEGG值都是唯一的,没有重复。
1 个解决方案
#1
1
You can clean that column using
您可以使用清理该列
library(stringr)
ColorMap$KEGGnumber <- str_extract(ColorMap$KEGGnumber, "[C][0-9]+")
The argument pattern
allows you to match with a regular expression, in this case, a simple one, telling you to match the capital letter C followed by any number of numbers.
参数模式允许您匹配正则表达式,在本例中是一个简单表达式,告诉您匹配大写字母C后跟任意数量的数字。
Afterwards, grouping using dplyr
we have
然后,我们使用dplyr进行分组
library(dplyr)
ColorMap %>% group_by(KEGGnumber) %>% summarize(mean(Colors))
#1
1
You can clean that column using
您可以使用清理该列
library(stringr)
ColorMap$KEGGnumber <- str_extract(ColorMap$KEGGnumber, "[C][0-9]+")
The argument pattern
allows you to match with a regular expression, in this case, a simple one, telling you to match the capital letter C followed by any number of numbers.
参数模式允许您匹配正则表达式,在本例中是一个简单表达式,告诉您匹配大写字母C后跟任意数量的数字。
Afterwards, grouping using dplyr
we have
然后,我们使用dplyr进行分组
library(dplyr)
ColorMap %>% group_by(KEGGnumber) %>% summarize(mean(Colors))