I have the following points in 3D space:
我在3D空间中有以下几点:
I need to group the points, according to D_max
and d_max
:
我需要根据D_max和d_max对点进行分组:
D_max = max dimension of each group
d_max = max distance of points inside each group
Like this:
The shape of the group in the above image looks like a box, but the shape can be anything which would be the output of the grouping algorithm.
上图中组的形状看起来像一个盒子,但形状可以是任何可以作为分组算法输出的形状。
I'm using Python and visualize the results with Blender. I'm considering using the scipy.spatial.KDTree and calling its query API, however, I'm not sure if that's the right tool for the job at hand. I'm worried that there might be a better tool which I'm not aware of. I'm curious to know if there is any other tool/library/algorithm which can help me.
我正在使用Python并使用Blender可视化结果。我正在考虑使用scipy.spatial.KDTree并调用它的查询API,但是,我不确定这是否适合手头的工作。我担心可能有一个我不知道的更好的工具。我很想知道是否还有其他工具/库/算法可以帮助我。
As @CoMartel pointed out, there is DBSCAN and also HDBSCAN clustering modules which look like a good fit for this type of problems. However, as pointed out by @Paul they lack the option for max size of the cluster which correlates to my D_max
parameter. I'm not sure how to add a max cluster size feature to DBSCAN and HDBSCAN clustering.
正如@CoMartel指出的那样,DBSCAN和HDBSCAN集群模块看起来非常适合这类问题。然而,正如@Paul指出的那样,他们缺少与我的D_max参数相关的群集最大大小的选项。我不确定如何为DBSCAN和HDBSCAN群集添加最大群集大小功能。
Thanks to @Anony-Mousse I watched Agglomerative Clustering: how it works and Hierarchical Clustering 3: single-link vs. complete-link and I'm studying Comparing Python Clustering Algorithms, I feel like it's getting more clear how these algorithms work.
感谢@ Anony-Mousse,我观察了凝聚聚类:它是如何工作的和分层聚类3:单链接与完整链接,我正在研究比较Python聚类算法,我觉得这些算法的工作方式越来越清晰。
3 个解决方案
#1
1
As requested, my comment as an answer :
根据要求,我的评论作为答案:
You could use DBSCAN(http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) or HDBSCAN.
您可以使用DBSCAN(http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)或HDBSCAN。
Both these algorithm allow to group each point according to d_max (maximum distance between 2 points of the same dataset), but they don't take the maximum cluster size. The only way to limit the maximum size of a cluster is by reducing the eps
parameter, which control the max distance between 2 points of the same cluster.
这两种算法都允许根据d_max(同一数据集的2个点之间的最大距离)对每个点进行分组,但它们不采用最大簇大小。限制群集最大大小的唯一方法是减少eps参数,该参数控制同一群集的2个点之间的最大距离。
#2
2
Use hierarchical agglomerative clustering.
使用分层凝聚聚类。
If you use complete linkage you can control the maximum diameter of the clusters. The complete link is the maximum distance.
如果使用完整链接,则可以控制簇的最大直径。完整链接是最大距离。
DBSCAN's epsilon parameter is not a maximum distance because multiple steps are joined transitively. Clusters can become much larger than epsilon!
DBSCAN的epsilon参数不是最大距离,因为可以传递多个步骤。集群可以变得比epsilon大得多!
#3
0
DBSCAN clustering algorithm with the maximum distance of points inside each group extension
DBSCAN聚类算法,每组扩展内点的最大距离
You can use the DBSCAN algorithm recursively.
您可以递归使用DBSCAN算法。
def DBSCAN_with_max_size(myData, eps = E, max_size = S):
clusters = DBSCAN(myData, eps = E)
Big_Clusters = find_big_clusters(clusters)
for big_cluster in Big_Clusters:
DBSCAN_with_max_size(big_cluster ,eps = E/2 ,max_size = S) //eps is something lower than E (e.g. E/2)
#1
1
As requested, my comment as an answer :
根据要求,我的评论作为答案:
You could use DBSCAN(http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) or HDBSCAN.
您可以使用DBSCAN(http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)或HDBSCAN。
Both these algorithm allow to group each point according to d_max (maximum distance between 2 points of the same dataset), but they don't take the maximum cluster size. The only way to limit the maximum size of a cluster is by reducing the eps
parameter, which control the max distance between 2 points of the same cluster.
这两种算法都允许根据d_max(同一数据集的2个点之间的最大距离)对每个点进行分组,但它们不采用最大簇大小。限制群集最大大小的唯一方法是减少eps参数,该参数控制同一群集的2个点之间的最大距离。
#2
2
Use hierarchical agglomerative clustering.
使用分层凝聚聚类。
If you use complete linkage you can control the maximum diameter of the clusters. The complete link is the maximum distance.
如果使用完整链接,则可以控制簇的最大直径。完整链接是最大距离。
DBSCAN's epsilon parameter is not a maximum distance because multiple steps are joined transitively. Clusters can become much larger than epsilon!
DBSCAN的epsilon参数不是最大距离,因为可以传递多个步骤。集群可以变得比epsilon大得多!
#3
0
DBSCAN clustering algorithm with the maximum distance of points inside each group extension
DBSCAN聚类算法,每组扩展内点的最大距离
You can use the DBSCAN algorithm recursively.
您可以递归使用DBSCAN算法。
def DBSCAN_with_max_size(myData, eps = E, max_size = S):
clusters = DBSCAN(myData, eps = E)
Big_Clusters = find_big_clusters(clusters)
for big_cluster in Big_Clusters:
DBSCAN_with_max_size(big_cluster ,eps = E/2 ,max_size = S) //eps is something lower than E (e.g. E/2)