Numpy数组在屏蔽时丢失尺寸

I want to select certain elements of an array and perform a weighted average calculation based on the values. However, using a filter condition, destroys the original structure of the array. arr which was of shape (2, 2, 3, 2) is turned into a 1-dimensional array. This is of no use to me, as not all these elements need to be combined later on with each other (but subarrays of them). How can I avoid this flattening?

我想选择数组的某些元素,并根据值执行加权平均计算。但是,使用过滤条件会破坏数组的原始结构。具有形状(2,2,3,2)的arr变成一维阵列。这对我来说毫无用处,因为并非所有这些元素都需要在以后相互组合(但是它们的子阵列)。我怎样才能避免这种扁平化呢?

>>> arr = np.asarray([ [[[1, 11], [2, 22], [3, 33]], [[4, 44], [5, 55], [6, 66]]], [ [[7, 77], [8, 88], [9, 99]], [[0, 32], [1, 33], [2, 34] ]] ])
>>> arr
array([[[[ 1, 11],
         [ 2, 22],
         [ 3, 33]],

        [[ 4, 44],
         [ 5, 55],
         [ 6, 66]]],


       [[[ 7, 77],
         [ 8, 88],
         [ 9, 99]],

        [[ 0, 32],
         [ 1, 33],
         [ 2, 34]]]])
>>> arr.shape
(2, 2, 3, 2)
>>> arr[arr>3]
array([11, 22, 33,  4, 44,  5, 55,  6, 66,  7, 77,  8, 88,  9, 99, 32, 33,
       34])
>>> arr[arr>3].shape
(18,)

3 个解决方案

#1

Checkout numpy.where

http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

To keep the same dimensionality you are going to need a fill value. In the example below I use 0, but you could also use np.nan

要保持相同的维度,您需要填充值。在下面的示例中,我使用0,但您也可以使用np.nan

np.where(arr>3, arr, 0)

returns

array([[[[ 0, 11],
         [ 0, 22],
         [ 0, 33]],

        [[ 4, 44],
         [ 5, 55],
         [ 6, 66]]],


       [[[ 7, 77],
         [ 8, 88],
         [ 9, 99]],

        [[ 0, 32],
         [ 0, 33],
         [ 0, 34]]]])

#2

You might consider using an np.ma.masked_array to represent the subset of elements that satisfy your condition:

您可以考虑使用np.ma.masked_array来表示满足条件的元素子集:

import numpy as np

arr = np.asarray([[[[1, 11], [2, 22], [3, 33]],
                   [[4, 44], [5, 55], [6, 66]]],
                  [[[7, 77], [8, 88], [9, 99]],
                   [[0, 32], [1, 33], [2, 34]]]])

masked_arr = np.ma.masked_less(arr, 3)

print(masked_arr)
# [[[[-- 11]
#    [-- 22]
#    [3 33]]

#   [[4 44]
#    [5 55]
#    [6 66]]]


#  [[[7 77]
#    [8 88]
#    [9 99]]

#   [[-- 32]
#    [-- 33]
#    [-- 34]]]]

As you can see, the masked array retains its original dimensions. You can access the underlying data and the mask via the .data and .mask attributes respectively. Most numpy functions will not take into account masked values, e.g.:

如您所见,蒙版数组保留其原始尺寸。您可以分别通过.data和.mask属性访问基础数据和掩码。大多数numpy函数都不会考虑屏蔽值,例如:

# mean of whole array
print(arr.mean())
# 26.75

# mean of non-masked elements only
print(masked_arr.mean())
# 33.4736842105

The result of an element-wise operation on a masked array and a non-masked array will also preserve the values of the mask:

对掩码数组和非掩码数组进行逐元素操作的结果也将保留掩码的值:

masked_arrsum = masked_arr + np.random.randn(*arr.shape)

print(masked_arrsum)
# [[[[-- 11.359989067421582]
#    [-- 23.249092437269162]
#    [3.326111354088174 32.679132708120726]]

#   [[4.289134334263137 43.38559221094378]
#    [6.028063054523145 53.5043991898567]
#    [7.44695154979811 65.56890530368757]]]


#  [[[8.45692625294376 77.36860675985407]
#    [5.915835159196378 87.28574554110307]
#    [8.251106168209688 98.7621940026713]]

#   [[-- 33.24398289945855]
#    [-- 33.411941757624284]
#    [-- 34.964817895873715]]]]

The sum is only computed over the non-masked values of masked_arr - you can see this by looking at masked_sum.data:

总和仅在masked_arr的非掩码值上计算 - 您可以通过查看masked_sum.data来看到:

print(masked_sum.data)
# [[[[  1.          11.35998907]
#    [  2.          23.24909244]
#    [  3.32611135  32.67913271]]

#   [[  4.28913433  43.38559221]
#    [  6.02806305  53.50439919]
#    [  7.44695155  65.5689053 ]]]


#  [[[  8.45692625  77.36860676]
#    [  5.91583516  87.28574554]
#    [  8.25110617  98.762194  ]]

#   [[  0.          33.2439829 ]
#    [  1.          33.41194176]
#    [  2.          34.9648179 ]]]]

#3

Look at arr>3:

看看arr> 3:

In [71]: arr>3
Out[71]: 
array([[[[False,  True],
         [False,  True],
         [False,  True]],

        [[ True,  True],
         [ True,  True],
         [ True,  True]]],


       [[[ True,  True],
         [ True,  True],
         [ True,  True]],

        [[False,  True],
         [False,  True],
         [False,  True]]]], dtype=bool)

arr[arr>3] selects those elements where the mask is True. What kind of structure or shape do you want that selection to have? Flat is the only thing that makes sense, doesn't it? arr itself is not changed.

arr [arr> 3]选择掩码为True的元素。您希望选择具有什么样的结构或形状?平是唯一有意义的,不是吗? arr本身没有改变。

You could zero out the terms that don't fit the mask,

你可以将那些不适合面具的术语归零,

In [84]: arr1=arr.copy()
In [85]: arr1[arr<=3]=0
In [86]: arr1
Out[86]: 
array([[[[ 0, 11],
         [ 0, 22],
         [ 0, 33]],

        [[ 4, 44],
         [ 5, 55],
         [ 6, 66]]],


       [[[ 7, 77],
         [ 8, 88],
         [ 9, 99]],

        [[ 0, 32],
         [ 0, 33],
         [ 0, 34]]]])

Now you could do weight sums or averages over various dimensions.

现在你可以在不同的维度上做加权总和或平均值。

np.nonzero (or np.where) might also be useful, giving you the indices of the the selected terms:

np.nonzero(或np.where)也可能有用,为您提供所选术语的索引:

In [88]: np.nonzero(arr>3)
Out[88]: 
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 array([0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1]),
 array([0, 1, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 1, 2]),
 array([1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1]))

#1