I am trying to adjust the formatting of a data set. My current set looks like this, in two columns. The first column is a "cluster" and the second column "name" contains values within each cluster:
我正在尝试调整数据集的格式。我的当前集合如下所示,分为两列。第一列是“群集”,第二列“name”包含每个群集中的值:
Cluster Name
A 1
A 2
A 3
B 4
B 5
C 2
C 6
C 7
And I'd like a list that is, one column wherein all the values from column 2 are listed under the associated cluster from column 1 in a single column:
我想要一个列表,即一列,其中第2列的所有值都列在单列中第1列的关联集群下:
Cluster A
1
2
3
Cluster B
4
5
Cluster C
2
6
7
I've been trying in R and Excel with no luck for the last few hours. Any ideas?
在过去的几个小时里,我一直在尝试使用R和Excel而没有运气。有任何想法吗?
1 个解决方案
#1
0
Using a trick with tidyr::nest
:
使用tidyr :: nest的技巧:
library(dplyr)
library(tidyr)
df %>% mutate(Cluster = paste0("Cluster_",Cluster)) %>% nest(Name) %>% t %>% unlist %>% as.data.frame
# .
# 1 Cluster_A
# 2 1
# 3 2
# 4 3
# 5 Cluster_B
# 6 4
# 7 5
# 8 Cluster_C
# 9 2
# 10 6
# 11 7
#1
0
Using a trick with tidyr::nest
:
使用tidyr :: nest的技巧:
library(dplyr)
library(tidyr)
df %>% mutate(Cluster = paste0("Cluster_",Cluster)) %>% nest(Name) %>% t %>% unlist %>% as.data.frame
# .
# 1 Cluster_A
# 2 1
# 3 2
# 4 3
# 5 Cluster_B
# 6 4
# 7 5
# 8 Cluster_C
# 9 2
# 10 6
# 11 7