I have a series of 2d arrays where the rows are points in some space. Many similar points occur across all arrays but in different row order. I want to sort the rows so they have the most similar order. Also the points are too different for clustering with K-means or DBSCAN. The problem can also be cast like this. If I stack the arrays into a 3d array, how do I permute the rows to minimize the average standard deviation (SD) along the 2nd axis? What's a good sorting algorithm for this problem?
我有一系列的2d数组,其中行是某些空间中的点。所有数组都出现了许多类似的点,但行顺序不同。我想对行进行排序,以便它们具有最相似的顺序。此外,对于使用K-means或DBSCAN进行聚类,这些点也太不同了。问题也可以像这样投射。如果我将数组堆叠成3d数组,我如何置换行以最小化沿第二轴的平均标准偏差(SD)?这个问题有什么好的排序算法?
I've tried the following approaches.
我尝试了以下方法。
-
Create a set of reference 2d array and sort rows in each array to minimize mean euclidean distances to the reference 2d array. This I am afraid gives biased results.
创建一组引用2d数组并对每个数组中的行进行排序,以最小化到引用2d数组的平均欧氏距离。这恐怕会产生偏颇的结果。
-
Sort rows in arrays pairwise, then pairs of pair-medians, then pairs of that, etc... This doesn't really work and I'm not sure why.
按顺序对数组中的行进行排序,然后对成对的中间数,然后是成对的等等......这实际上不起作用,我不确定为什么。
A third approach could be just brute force optimization but I try to avoid that since I have multiple sets of arrays to perform the procedure on.
第三种方法可能只是强力优化,但我试图避免这种情况,因为我有多组数组来执行该过程。
This is my code for the 2nd approach (Python):
这是我的第二种方法(Python)的代码:
def reorder_to(A, B):
"""Reorder rows in A to best match rows in B.
Input
-----
A : N x M numpy.array
B : N x M numpy.array
Output
------
perm_order : permutation order
"""
if A.shape != B.shape:
print "A and B must have the same shape"
return None
N = A.shape[0]
# Create a distance matrix of distance between rows in A and B
distance_matrix = np.ones((N, N))*np.inf
for i, a in enumerate(A):
for ii, b in enumerate(B):
ba = (b-a)
distance_matrix[i, ii] = np.sqrt(np.dot(ba, ba))
# Choose permutation order by smallest distances first
perm_order = [[] for _ in range(N)]
for _ in range(N):
ind = np.argmin(distance_matrix)
i, ii = ind/N, ind%N
perm_order[ii] = i
distance_matrix[i, :] = np.inf
distance_matrix[:, ii] = np.inf
return perm_order
def permute_tensor_rows(A):
"""Permute 1d rows in 3d array along the 0th axis to minimize average SD along 2nd axis.
Input
-----
A : numpy.3darray
Each "slice" in the 2nd direction is an independent array whose rows can be permuted
to decrease the average SD in the 2nd direction.
Output
------
A : numpy.3darray
A with sorted rows in each "slice".
"""
step = 2
while step <= A.shape[2]:
for k in range(0, A.shape[2], step):
# If last, reorder to previous
if k + step > A.shape[2]:
A_kk = A[:, :, k:(k+step)]
kk_order = reorder_to(np.median(A_kk, axis=2), np.median(A_k, axis=2))
A[:, :, k:(k+step)] = A[kk_order, :, k:(k+step)]
continue
k_0, k_1 = k, k+step/2
kk_0, kk_1 = k+step/2, k+step
A_k = A[:, :, k_0:k_1]
A_kk = A[:, :, kk_0:kk_1]
order = reorder_to(np.median(A_k, axis=2), np.median(A_kk, axis=2))
A[:, :, k_0:k_1] = A[order, :, k_0:k_1]
print "Step:", step, "\t ... Average SD:", np.mean(np.std(A, axis=2))
step *= 2
return A
1 个解决方案
#1
1
Sorry I should have looked at your code sample; that was very informative.
对不起,我应该查看你的代码示例;这是非常翔实的。
Seems like this here gives an out-of-the-box solution to your problem:
这里看起来像这样为您的问题提供了开箱即用的解决方案:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy.optimize.linear_sum_assignment
Only really feasible for a few 100 points at most though, in my experience.
根据我的经验,只有最多100分才真正可行。
#1
1
Sorry I should have looked at your code sample; that was very informative.
对不起,我应该查看你的代码示例;这是非常翔实的。
Seems like this here gives an out-of-the-box solution to your problem:
这里看起来像这样为您的问题提供了开箱即用的解决方案:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy.optimize.linear_sum_assignment
Only really feasible for a few 100 points at most though, in my experience.
根据我的经验,只有最多100分才真正可行。