为具有多个维度的numpy.argsort排序不变量

时间:2022-10-17 21:26:10

numpy.argsort docs state

numpy.argsort docs状态

Returns:
index_array : ndarray, int Array of indices that sort a along the specified axis. If a is one-dimensional, a[index_array] yields a sorted a.

返回:index_array:ndarray,int沿指定轴排序a的索引数组。如果a是一维的,则[index_array]产生排序的a。

How can I apply the result of numpy.argsort for a multidimensional array to get back a sorted array? (NOT just a 1-D or 2-D array; it could be an N-dimensional array where N is known only at runtime)

如何将numpy.argsort的结果应用于多维数组以获取已排序的数组? (不只是1-D或2-D阵列;它可能是N维数组,其中N仅在运行时已知)

>>> import numpy as np
>>> np.random.seed(123)
>>> A = np.random.randn(3,2)
>>> A
array([[-1.0856306 ,  0.99734545],
       [ 0.2829785 , -1.50629471],
       [-0.57860025,  1.65143654]])
>>> i=np.argsort(A,axis=-1)
>>> A[i]
array([[[-1.0856306 ,  0.99734545],
        [ 0.2829785 , -1.50629471]],

       [[ 0.2829785 , -1.50629471],
        [-1.0856306 ,  0.99734545]],

       [[-1.0856306 ,  0.99734545],
        [ 0.2829785 , -1.50629471]]])

For me it's not just a matter of using sort() instead; I have another array B and I want to order B using the results of np.argsort(A) along the appropriate axis. Consider the following example:

对我而言,这不仅仅是使用sort()的问题;我有另一个数组B,我想使用np.argsort(A)沿适当的轴的结果来命令B.请考虑以下示例:

>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = np.argsort(A,axis=-1)
>>> BsortA = ???             
# should result in [[4,1,3],[5,1,9]]
# so that corresponding elements of B and sort(A) stay together

It looks like this functionality is already an enhancement request in numpy.

看起来这个功能已经是numpy中的增强请求了。

3 个解决方案

#1


2  

The numpy issue #8708 has a sample implementation of take_along_axis that does what I need; I'm not sure if it's efficient for large arrays but it seems to work.

numpy问题#8708有一个take_along_axis的示例实现,可以满足我的需要;我不确定它对于大型阵列是否有效但似乎有效。

def take_along_axis(arr, ind, axis):
    """
    ... here means a "pack" of dimensions, possibly empty

    arr: array_like of shape (A..., M, B...)
        source array
    ind: array_like of shape (A..., K..., B...)
        indices to take along each 1d slice of `arr`
    axis: int
        index of the axis with dimension M

    out: array_like of shape (A..., K..., B...)
        out[a..., k..., b...] = arr[a..., inds[a..., k..., b...], b...]
    """
    if axis < 0:
       if axis >= -arr.ndim:
           axis += arr.ndim
       else:
           raise IndexError('axis out of range')
    ind_shape = (1,) * ind.ndim
    ins_ndim = ind.ndim - (arr.ndim - 1)   #inserted dimensions

    dest_dims = list(range(axis)) + [None] + list(range(axis+ins_ndim, ind.ndim))

    # could also call np.ix_ here with some dummy arguments, then throw those results away
    inds = []
    for dim, n in zip(dest_dims, arr.shape):
        if dim is None:
            inds.append(ind)
        else:
            ind_shape_dim = ind_shape[:dim] + (-1,) + ind_shape[dim+1:]
            inds.append(np.arange(n).reshape(ind_shape_dim))

    return arr[tuple(inds)]

which yields

>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = A.argsort(axis=-1)
>>> take_along_axis(A,i,axis=-1)
array([[1, 2, 3],
       [0, 4, 6]])
>>> take_along_axis(B,i,axis=-1)
array([[4, 1, 3],
       [5, 1, 9]])

#2


1  

This argsort produces a (3,2) array

这个argsort产生一个(3,2)数组

In [453]: idx=np.argsort(A,axis=-1)
In [454]: idx
Out[454]: 
array([[0, 1],
       [1, 0],
       [0, 1]], dtype=int32)

As you note applying this to A to get the equivalent of np.sort(A, axis=-1) isn't obvious. The iterative solution is sort each row (a 1d case) with:

正如您所注意到将此应用于A以获得等效的np.sort(A,axis = -1)并不明显。迭代解决方案是按行排序(1d案例):

In [459]: np.array([x[i] for i,x in zip(idx,A)])
Out[459]: 
array([[-1.0856306 ,  0.99734545],
       [-1.50629471,  0.2829785 ],
       [-0.57860025,  1.65143654]])

While probably not the fastest, it is probably the clearest solution, and a good starting point for conceptualizing a better solution.

虽然可能不是最快的,但它可能是最清晰的解决方案,也是构思更好解决方案的良好起点。

The tuple(inds) from the take solution is:

来自解决方案的元组(inds)是:

(array([[0],
        [1],
        [2]]), 
 array([[0, 1],
        [1, 0],
        [0, 1]], dtype=int32))
In [470]: A[_]
Out[470]: 
array([[-1.0856306 ,  0.99734545],
       [-1.50629471,  0.2829785 ],
       [-0.57860025,  1.65143654]])

In other words:

换一种说法:

In [472]: A[np.arange(3)[:,None], idx]
Out[472]: 
array([[-1.0856306 ,  0.99734545],
       [-1.50629471,  0.2829785 ],
       [-0.57860025,  1.65143654]])

The first part is what np.ix_ would construct, but it does not 'like' the 2d idx.

第一部分是np.ix_将构造的内容,但它并不“喜欢”2d idx。


Looks like I explored this topic a couple of years ago

看起来我几年前就探讨了这个话题

argsort for a multidimensional ndarray

argsort为多维ndarray

a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]

I tried to explain what is going on. The take function does the same sort of thing, but constructs the indexing tuple for a more general case (dimensions and axis). Generalizing to more dimensions, but still with axis=-1 should be easy.

我试着解释发生了什么。 take函数执行相同的操作,但为更一般的情况(维度和轴)构造索引元组。推广到更多维度,但仍然使用axis = -1应该很容易。

For the first axis, A[np.argsort(A,axis=0),np.arange(2)] works.

对于第一轴,A [np.argsort(A,axis = 0),np.arange(2)]起作用。

#3


0  

We just need to use advanced-indexing to index along all axes with those indices array. We can use np.ogrid to create open grids of range arrays along all axes and then replace only for the input axis with the input indices. Finally, index into data array with those indices for the desired output. Thus, essentially, we would have -

我们只需要使用高级索引来沿着所有轴索引那些索引数组。我们可以使用np.ogrid沿所有轴创建范围数组的开放网格,然后仅用输入索引替换输入轴。最后,索引到具有所需输出的索引的数据数组。因此,基本上,我们会 -

# Inputs : arr, ind, axis
idx = np.ogrid[tuple(map(slice, ind.shape))]
idx[axis] = ind
out = arr[tuple(idx)]

Just to make it functional and do error checks, let's create two functions - One to get those indices and second one to feed in the data array and simply index. The idea with the first function is to get the indices that could be re-used for indexing into any arbitrary array which would support the necessary number of dimensions and lengths along each axis.

为了使其正常运行并进行错误检查,让我们创建两个函数 - 一个用于获取这些索引,另一个用于提供数据数组并简单地索引。第一个函数的想法是获得可以重新用于索引的索引到任何任意数组,这些数组将支持沿每个轴的必要数量的维度和长度。

Hence, the implementations would be -

因此,实施将是 -

def advindex_allaxes(ind, axis):
    axis = np.core.multiarray.normalize_axis_index(axis,ind.ndim)
    idx = np.ogrid[tuple(map(slice, ind.shape))]
    idx[axis] = ind
    return tuple(idx)

def take_along_axis(arr, ind, axis):
    return arr[advindex_allaxes(ind, axis)]

Sample runs -

样品运行 -

In [161]: A = np.array([[3,2,1],[4,0,6]])

In [162]: B = np.array([[3,1,4],[1,5,9]])

In [163]: i = A.argsort(axis=-1)

In [164]: take_along_axis(A,i,axis=-1)
Out[164]: 
array([[1, 2, 3],
       [0, 4, 6]])

In [165]: take_along_axis(B,i,axis=-1)
Out[165]: 
array([[4, 1, 3],
       [5, 1, 9]])

Relevant one.

#1


2  

The numpy issue #8708 has a sample implementation of take_along_axis that does what I need; I'm not sure if it's efficient for large arrays but it seems to work.

numpy问题#8708有一个take_along_axis的示例实现,可以满足我的需要;我不确定它对于大型阵列是否有效但似乎有效。

def take_along_axis(arr, ind, axis):
    """
    ... here means a "pack" of dimensions, possibly empty

    arr: array_like of shape (A..., M, B...)
        source array
    ind: array_like of shape (A..., K..., B...)
        indices to take along each 1d slice of `arr`
    axis: int
        index of the axis with dimension M

    out: array_like of shape (A..., K..., B...)
        out[a..., k..., b...] = arr[a..., inds[a..., k..., b...], b...]
    """
    if axis < 0:
       if axis >= -arr.ndim:
           axis += arr.ndim
       else:
           raise IndexError('axis out of range')
    ind_shape = (1,) * ind.ndim
    ins_ndim = ind.ndim - (arr.ndim - 1)   #inserted dimensions

    dest_dims = list(range(axis)) + [None] + list(range(axis+ins_ndim, ind.ndim))

    # could also call np.ix_ here with some dummy arguments, then throw those results away
    inds = []
    for dim, n in zip(dest_dims, arr.shape):
        if dim is None:
            inds.append(ind)
        else:
            ind_shape_dim = ind_shape[:dim] + (-1,) + ind_shape[dim+1:]
            inds.append(np.arange(n).reshape(ind_shape_dim))

    return arr[tuple(inds)]

which yields

>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = A.argsort(axis=-1)
>>> take_along_axis(A,i,axis=-1)
array([[1, 2, 3],
       [0, 4, 6]])
>>> take_along_axis(B,i,axis=-1)
array([[4, 1, 3],
       [5, 1, 9]])

#2


1  

This argsort produces a (3,2) array

这个argsort产生一个(3,2)数组

In [453]: idx=np.argsort(A,axis=-1)
In [454]: idx
Out[454]: 
array([[0, 1],
       [1, 0],
       [0, 1]], dtype=int32)

As you note applying this to A to get the equivalent of np.sort(A, axis=-1) isn't obvious. The iterative solution is sort each row (a 1d case) with:

正如您所注意到将此应用于A以获得等效的np.sort(A,axis = -1)并不明显。迭代解决方案是按行排序(1d案例):

In [459]: np.array([x[i] for i,x in zip(idx,A)])
Out[459]: 
array([[-1.0856306 ,  0.99734545],
       [-1.50629471,  0.2829785 ],
       [-0.57860025,  1.65143654]])

While probably not the fastest, it is probably the clearest solution, and a good starting point for conceptualizing a better solution.

虽然可能不是最快的,但它可能是最清晰的解决方案,也是构思更好解决方案的良好起点。

The tuple(inds) from the take solution is:

来自解决方案的元组(inds)是:

(array([[0],
        [1],
        [2]]), 
 array([[0, 1],
        [1, 0],
        [0, 1]], dtype=int32))
In [470]: A[_]
Out[470]: 
array([[-1.0856306 ,  0.99734545],
       [-1.50629471,  0.2829785 ],
       [-0.57860025,  1.65143654]])

In other words:

换一种说法:

In [472]: A[np.arange(3)[:,None], idx]
Out[472]: 
array([[-1.0856306 ,  0.99734545],
       [-1.50629471,  0.2829785 ],
       [-0.57860025,  1.65143654]])

The first part is what np.ix_ would construct, but it does not 'like' the 2d idx.

第一部分是np.ix_将构造的内容,但它并不“喜欢”2d idx。


Looks like I explored this topic a couple of years ago

看起来我几年前就探讨了这个话题

argsort for a multidimensional ndarray

argsort为多维ndarray

a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]

I tried to explain what is going on. The take function does the same sort of thing, but constructs the indexing tuple for a more general case (dimensions and axis). Generalizing to more dimensions, but still with axis=-1 should be easy.

我试着解释发生了什么。 take函数执行相同的操作,但为更一般的情况(维度和轴)构造索引元组。推广到更多维度,但仍然使用axis = -1应该很容易。

For the first axis, A[np.argsort(A,axis=0),np.arange(2)] works.

对于第一轴,A [np.argsort(A,axis = 0),np.arange(2)]起作用。

#3


0  

We just need to use advanced-indexing to index along all axes with those indices array. We can use np.ogrid to create open grids of range arrays along all axes and then replace only for the input axis with the input indices. Finally, index into data array with those indices for the desired output. Thus, essentially, we would have -

我们只需要使用高级索引来沿着所有轴索引那些索引数组。我们可以使用np.ogrid沿所有轴创建范围数组的开放网格,然后仅用输入索引替换输入轴。最后,索引到具有所需输出的索引的数据数组。因此,基本上,我们会 -

# Inputs : arr, ind, axis
idx = np.ogrid[tuple(map(slice, ind.shape))]
idx[axis] = ind
out = arr[tuple(idx)]

Just to make it functional and do error checks, let's create two functions - One to get those indices and second one to feed in the data array and simply index. The idea with the first function is to get the indices that could be re-used for indexing into any arbitrary array which would support the necessary number of dimensions and lengths along each axis.

为了使其正常运行并进行错误检查,让我们创建两个函数 - 一个用于获取这些索引,另一个用于提供数据数组并简单地索引。第一个函数的想法是获得可以重新用于索引的索引到任何任意数组,这些数组将支持沿每个轴的必要数量的维度和长度。

Hence, the implementations would be -

因此,实施将是 -

def advindex_allaxes(ind, axis):
    axis = np.core.multiarray.normalize_axis_index(axis,ind.ndim)
    idx = np.ogrid[tuple(map(slice, ind.shape))]
    idx[axis] = ind
    return tuple(idx)

def take_along_axis(arr, ind, axis):
    return arr[advindex_allaxes(ind, axis)]

Sample runs -

样品运行 -

In [161]: A = np.array([[3,2,1],[4,0,6]])

In [162]: B = np.array([[3,1,4],[1,5,9]])

In [163]: i = A.argsort(axis=-1)

In [164]: take_along_axis(A,i,axis=-1)
Out[164]: 
array([[1, 2, 3],
       [0, 4, 6]])

In [165]: take_along_axis(B,i,axis=-1)
Out[165]: 
array([[4, 1, 3],
       [5, 1, 9]])

Relevant one.