如何将numpy 2d数组裁剪为非零值?

时间:2022-01-15 21:19:22

Let's say i have a 2d boolean numpy array like this:

假设我有一个像这样的2d布尔numpy数组:

import numpy as np
a = np.array([
    [0,0,0,0,0,0],
    [0,1,0,1,0,0],
    [0,1,1,0,0,0],
    [0,0,0,0,0,0],
], dtype=bool)

How can i in general crop it to the smallest box (rectangle, kernel) that includes all True values?

我怎样才能将它裁剪到包含所有True值的最小框(矩形,内核)?

So in the example above:

所以在上面的例子中:

b = np.array([
    [1,0,1],
    [1,1,0],
], dtype=bool)

2 个解决方案

#1


3  

After some more fiddling with this, i actually found a solution myself:

经过一番摆弄,我实际上找到了一个解决方案:

coords = np.argwhere(a)
x_min, y_min = coords.min(axis=0)
x_max, y_max = coords.max(axis=0)
b = cropped = a[x_min:x_max+1, y_min:y_max+1]

The above works for boolean arrays out of the box. In case you have other conditions like a threshold t and want to crop to values larger than t, simply modify the first line:

以上适用于开箱即用的布尔数组。如果你有其他条件,如阈值t,并想要裁剪到大于t的值,只需修改第一行:

coords = np.argwhere(a > t)

#2


0  

Here's one with slicing and argmax to get the bounds -

这里有一个切片和argmax来获得界限 -

def smallestbox(a):
    r = a.any(1)
    if r.any():
        m,n = a.shape
        c = a.any(0)
        out = a[r.argmax():m-r[::-1].argmax(), c.argmax():n-c[::-1].argmax()]
    else:
        out = np.empty((0,0),dtype=bool)
    return out

Sample runs -

样品运行 -

In [142]: a
Out[142]: 
array([[False, False, False, False, False, False],
       [False,  True, False,  True, False, False],
       [False,  True,  True, False, False, False],
       [False, False, False, False, False, False]])

In [143]: smallestbox(a)
Out[143]: 
array([[ True, False,  True],
       [ True,  True, False]])

In [144]: a[:] = 0

In [145]: smallestbox(a)
Out[145]: array([], shape=(0, 0), dtype=bool)

In [146]: a[2,2] = 1

In [147]: smallestbox(a)
Out[147]: array([[ True]])

Benchmarking

Other approach(es) -

其他方法 -

def argwhere_app(a): # @Jörn Hees's soln
    coords = np.argwhere(a)
    x_min, y_min = coords.min(axis=0)
    x_max, y_max = coords.max(axis=0)
    return a[x_min:x_max+1, y_min:y_max+1]

Timings for varying degrees of sparsity (approx. 10%, 50% & 90%) -

不同程度稀疏度的计时(约10%,50%和90%) -

In [370]: np.random.seed(0)
     ...: a = np.random.rand(5000,5000)>0.1

In [371]: %timeit argwhere_app(a)
     ...: %timeit smallestbox(a)
1 loop, best of 3: 310 ms per loop
100 loops, best of 3: 3.19 ms per loop

In [372]: np.random.seed(0)
     ...: a = np.random.rand(5000,5000)>0.5

In [373]: %timeit argwhere_app(a)
     ...: %timeit smallestbox(a)
1 loop, best of 3: 324 ms per loop
100 loops, best of 3: 3.21 ms per loop

In [374]: np.random.seed(0)
     ...: a = np.random.rand(5000,5000)>0.9

In [375]: %timeit argwhere_app(a)
     ...: %timeit smallestbox(a)
10 loops, best of 3: 106 ms per loop
100 loops, best of 3: 3.19 ms per loop

#1


3  

After some more fiddling with this, i actually found a solution myself:

经过一番摆弄,我实际上找到了一个解决方案:

coords = np.argwhere(a)
x_min, y_min = coords.min(axis=0)
x_max, y_max = coords.max(axis=0)
b = cropped = a[x_min:x_max+1, y_min:y_max+1]

The above works for boolean arrays out of the box. In case you have other conditions like a threshold t and want to crop to values larger than t, simply modify the first line:

以上适用于开箱即用的布尔数组。如果你有其他条件,如阈值t,并想要裁剪到大于t的值,只需修改第一行:

coords = np.argwhere(a > t)

#2


0  

Here's one with slicing and argmax to get the bounds -

这里有一个切片和argmax来获得界限 -

def smallestbox(a):
    r = a.any(1)
    if r.any():
        m,n = a.shape
        c = a.any(0)
        out = a[r.argmax():m-r[::-1].argmax(), c.argmax():n-c[::-1].argmax()]
    else:
        out = np.empty((0,0),dtype=bool)
    return out

Sample runs -

样品运行 -

In [142]: a
Out[142]: 
array([[False, False, False, False, False, False],
       [False,  True, False,  True, False, False],
       [False,  True,  True, False, False, False],
       [False, False, False, False, False, False]])

In [143]: smallestbox(a)
Out[143]: 
array([[ True, False,  True],
       [ True,  True, False]])

In [144]: a[:] = 0

In [145]: smallestbox(a)
Out[145]: array([], shape=(0, 0), dtype=bool)

In [146]: a[2,2] = 1

In [147]: smallestbox(a)
Out[147]: array([[ True]])

Benchmarking

Other approach(es) -

其他方法 -

def argwhere_app(a): # @Jörn Hees's soln
    coords = np.argwhere(a)
    x_min, y_min = coords.min(axis=0)
    x_max, y_max = coords.max(axis=0)
    return a[x_min:x_max+1, y_min:y_max+1]

Timings for varying degrees of sparsity (approx. 10%, 50% & 90%) -

不同程度稀疏度的计时(约10%,50%和90%) -

In [370]: np.random.seed(0)
     ...: a = np.random.rand(5000,5000)>0.1

In [371]: %timeit argwhere_app(a)
     ...: %timeit smallestbox(a)
1 loop, best of 3: 310 ms per loop
100 loops, best of 3: 3.19 ms per loop

In [372]: np.random.seed(0)
     ...: a = np.random.rand(5000,5000)>0.5

In [373]: %timeit argwhere_app(a)
     ...: %timeit smallestbox(a)
1 loop, best of 3: 324 ms per loop
100 loops, best of 3: 3.21 ms per loop

In [374]: np.random.seed(0)
     ...: a = np.random.rand(5000,5000)>0.9

In [375]: %timeit argwhere_app(a)
     ...: %timeit smallestbox(a)
10 loops, best of 3: 106 ms per loop
100 loops, best of 3: 3.19 ms per loop