使用第一列中的共享值组合两列

时间:2021-08-27 09:10:48

I am trying to adjust the formatting of a data set. My current set looks like this, in two columns. The first column is a "cluster" and the second column "name" contains values within each cluster:

我正在尝试调整数据集的格式。我的当前集合如下所示,分为两列。第一列是“群集”,第二列“name”包含每个群集中的值:

Cluster     Name
A           1
A           2
A           3
B           4
B           5
C           2
C           6
C           7

And I'd like a list that is, one column wherein all the values from column 2 are listed under the associated cluster from column 1 in a single column:

我想要一个列表,即一列,其中第2列的所有值都列在单列中第1列的关联集群下:

Cluster A
1
2
3
Cluster B
4
5
Cluster C
2
6
7

I've been trying in R and Excel with no luck for the last few hours. Any ideas?

在过去的几个小时里,我一直在尝试使用R和Excel而没有运气。有任何想法吗?

1 个解决方案

#1


0  

Using a trick with tidyr::nest :

使用tidyr :: nest的技巧:

library(dplyr)
library(tidyr)
df %>% mutate(Cluster = paste0("Cluster_",Cluster)) %>% nest(Name) %>% t %>% unlist %>% as.data.frame
# .
# 1  Cluster_A
# 2          1
# 3          2
# 4          3
# 5  Cluster_B
# 6          4
# 7          5
# 8  Cluster_C
# 9          2
# 10         6
# 11         7

#1


0  

Using a trick with tidyr::nest :

使用tidyr :: nest的技巧:

library(dplyr)
library(tidyr)
df %>% mutate(Cluster = paste0("Cluster_",Cluster)) %>% nest(Name) %>% t %>% unlist %>% as.data.frame
# .
# 1  Cluster_A
# 2          1
# 3          2
# 4          3
# 5  Cluster_B
# 6          4
# 7          5
# 8  Cluster_C
# 9          2
# 10         6
# 11         7