在Python的scipy.cluster.hierarchy中与集群编号匹配的dendrogram。

时间:2022-07-02 21:20:01

The following code generates a simple hierarchical cluster dendrogram with 10 leaf nodes:

下面的代码生成一个具有10个叶节点的简单的分层集群dendrogram:

import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt

X = scipy.randn(10,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
P =sch.dendrogram(Z)
plt.show()

I generate three flat clusters like so:

我生成三个扁平的集群:

T = sch.fcluster(Z, 3, 'maxclust')
# array([3, 1, 1, 2, 2, 2, 2, 2, 1, 2])

However, I'd like to see the cluster labels 1,2,3 on the dendrogram. It's easy for me to visualize with just 10 leaf nodes and three clusters, but when I have 1000 nodes and 10 clusters, I can't see what's going on.

但是,我希望在dendrogram上看到集群标签1、2、3。我很容易就能看到10个叶节点和3个集群,但是当我有1000个节点和10个集群时,我就不知道发生了什么。

How do I show the cluster numbers on the dendrogram? I'm open to other packages. Thanks.

如何显示dendrogram上的群集号?我愿意接受其他的包裹。谢谢。

1 个解决方案

#1


4  

Here is a solution that appropriately colors the clusters and labels the leaves of the dendrogram with the appropriate cluster name (leaves are labeled: 'point number, cluster number'). These techniques can be used independently or together. I modified your original example to include both:

这里有一个解决方案,可以适当地对集群进行颜色标记,并将dendrogram的叶子标记为适当的集群名称(叶子标记为:“点号,集群号”)。这些技术可以单独使用,也可以一起使用。我修改了你原来的例子,包括:

import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt

n=10
k=3
X = scipy.randn(n,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
T = sch.fcluster(Z, k, 'maxclust')

# calculate labels
labels=list('' for i in range(n))
for i in range(n):
    labels[i]=str(i)+ ',' + str(T[i])

# calculate color threshold
ct=Z[-(k-1),2]  

#plot
P =sch.dendrogram(Z,labels=labels,color_threshold=ct)
plt.show()

#1


4  

Here is a solution that appropriately colors the clusters and labels the leaves of the dendrogram with the appropriate cluster name (leaves are labeled: 'point number, cluster number'). These techniques can be used independently or together. I modified your original example to include both:

这里有一个解决方案,可以适当地对集群进行颜色标记,并将dendrogram的叶子标记为适当的集群名称(叶子标记为:“点号,集群号”)。这些技术可以单独使用,也可以一起使用。我修改了你原来的例子,包括:

import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt

n=10
k=3
X = scipy.randn(n,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
T = sch.fcluster(Z, k, 'maxclust')

# calculate labels
labels=list('' for i in range(n))
for i in range(n):
    labels[i]=str(i)+ ',' + str(T[i])

# calculate color threshold
ct=Z[-(k-1),2]  

#plot
P =sch.dendrogram(Z,labels=labels,color_threshold=ct)
plt.show()