如何在Python中的scipy中计算链接/距离矩阵的集群分配？

if you have this hierarchical clustering call in scipy in Python:

如果您在Python中使用scipy进行此层次化聚类调用：

from scipy.cluster.hierarchy import linkage
# dist_matrix is long form distance matrix
linkage_matrix = linkage(squareform(dist_matrix), linkage_method)

then what's an efficient way to go from this to cluster assignments for individual points? i.e. a vector of length N where N is number of points, where each entry i is the cluster number of point i, given the number of clusters generated by a given threshold thresh on the resulting clustering?

那么从单个点到集群分配的有效方法是什么？即，长度为N的向量，其中N是点的数量，其中每个条目i是点i的簇号，给定由给定阈值阈值产生的簇的数量在得到的聚类上？

To clarify: The cluster number would be the cluster that it's in after applying a threshold to the tree. In which case you would get a unique cluster for each leaf node for the cluster that it is in. Unique in the sense that each point belongs to one "most specific cluster" which is defined by the threshold where you cut the dendrogram.

澄清一下：群集号将是在将阈值应用于树之后所处的群集。在这种情况下，您将为其所在的集群的每个叶节点获得一个唯一的集群。从某种意义上说，每个点都属于一个“最具体的集群”，它由您剪切树形图的阈值定义。

I know that scipy.cluster.hierarchy.fclusterdata gives you this cluster assignment as its return value, but I am starting from a custom made distance matrix and distance metric, so I cannot use fclusterdata. The question boils down to: how can I compute what fclusterdata is computing -- the cluster assignments?

我知道scipy.cluster.hierarchy.fclusterdata给你这个集群赋值作为它的返回值，但是我从自定义的距离矩阵和距离度量开始，所以我不能使用fclusterdata。问题归结为：我如何计算fclusterdata的计算方式 - 集群分配？

2 个解决方案

#1

If I understand you right, that is what fcluster does:

如果我理解你，那就是fcluster所做的：

scipy.cluster.hierarchy.fcluster(Z, t, criterion='inconsistent', depth=2, R=None, monocrit=None)

scipy.cluster.hierarchy.fcluster（Z，t，criterion ='inconsistent'，depth = 2，R = None，monocrit = None）

Forms flat clusters from the hierarchical clustering defined by the linkage matrix Z.

从由链接矩阵Z定义的层次聚类中形成平面簇。

...

...

Returns: An array of length n. T[i] is the flat cluster number to which original observation i belongs.

返回：长度为n的数组。 T [i]是原始观察所属的平面簇编号。

So just call fcluster(linkage_matrix, t), where t is your threshold.

所以只需调用fcluster（linkage_matrix，t），其中t是你的阈值。

#2

If you'd like to see the members at every cluster level and in what order they are agglomerated see https://*.com/a/43170608/5728789

如果您希望查看每个群集级别的成员以及它们聚集的顺序，请参阅https://*.com/a/43170608/5728789

#1