找到numpy数组中最大上三角形条目索引的有效方法？

More specifically, I have a list of rows/columns that need to be ignored when choosing the max entry. In other words, when choosing the max upper triangular entry, certain indices need to skipped. In that case, what is the most efficient way to find the location of the max upper triangular entry?

更具体地说,我有一个行/列列表,在选择最大条目时需要忽略。换句话说,当选择最大上三角形条目时,需要跳过某些索引。在这种情况下,找到最大上三角形入口位置的最有效方法是什么?

For example:

>>> a
array([[0, 1, 1, 1],
       [1, 2, 3, 4],
       [4, 5, 6, 6],
       [4, 5, 6, 7]])
>>> indices_to_skip = [0,1,2]

I need to find the index of the min element among all elements in the upper triangle except for the entries a[0,1], a[0,2], and a[1,2].

我需要在上三角形中的所有元素中找到min元素的索引,除了条目a [0,1],a [0,2]和a [1,2]。

1 个解决方案

#1

You can use np.triu_indices_from:

您可以使用np.triu_indices_from:

>>> np.vstack(np.triu_indices_from(a,k=1)).T
array([[0, 1],
       [0, 2],
       [0, 3],
       [1, 2],
       [1, 3],
       [2, 3]])

>>> inds=inds[inds[:,1]>2] #Or whatever columns you want to start from.
>>> inds
array([[0, 3],
       [1, 3],
       [2, 3]])


>>> a[inds[:,0],inds[:,1]]
array([1, 4, 6])

>>> max_index = np.argmax(a[inds[:,0],inds[:,1]])
>>> inds[max_index]
array([2, 3]])

Or:

>>> inds=np.triu_indices_from(a,k=1)
>>> mask = (inds[1]>2) #Again change 2 for whatever columns you want to start at.
>>> a[inds][mask]
array([1, 4, 6])

>>> max_index = np.argmax(a[inds][mask])
>>> inds[mask][max_index]
array([2, 3]])

For the above you can use inds[0] to skip certains rows.

对于上面的内容,您可以使用inds [0]跳过某些行。

To skip specific rows or columns:

要跳过特定的行或列:

def ignore_upper(arr, k=0, skip_rows=None, skip_cols=None):
    rows, cols = np.triu_indices_from(arr, k=k)

    if skip_rows != None:
        row_mask = ~np.in1d(rows, skip_rows)
        rows = rows[row_mask]
        cols = cols[row_mask]

    if skip_cols != None:
        col_mask = ~np.in1d(cols, skip_cols)
        rows = rows[col_mask]
        cols = cols[col_mask]

    inds=np.ravel_multi_index((rows,cols),arr.shape)
    return np.take(arr,inds)

print ignore_upper(a, skip_rows=1, skip_cols=2) #Will also take numpy arrays for skipping.
[0 1 1 6 7]

The two can be combined and creative use of boolean indexing can help speed up specific cases.

这两者可以结合使用,布尔索引的创造性使用可以帮助加速特定情况。

Something interesting that I ran across, a faster way to take upper triu indices:

我跑过的有趣的东西,更快的方式来采取上部triu指数:

def fast_triu_indices(dim,k=0):

    tmp_range = np.arange(dim-k)
    rows = np.repeat(tmp_range,(tmp_range+1)[::-1])

    cols = np.ones(rows.shape[0],dtype=np.int)
    inds = np.cumsum(tmp_range[1:][::-1]+1)

    np.put(cols,inds,np.arange(dim*-1+2+k,1))
    cols[0] = k
    np.cumsum(cols,out=cols)
    return (rows,cols)

Its about ~6x faster although it does not work for k<0:

它大约快6倍,虽然它不适用于k <0:

dim=5000
a=np.random.rand(dim,dim)

k=50
t=time.time()
rows,cols=np.triu_indices(dim,k=k)
print time.time()-t
0.913508892059

t=time.time()
rows2,cols2,=fast_triu_indices(dim,k=k)
print time.time()-t
0.16515994072

print np.allclose(rows,rows2)
True

print np.allclose(cols,cols2)
True

#1