`numpy.mean`与元组一起用作`axis`参数：不使用屏蔽数组

I have one simple 3D array a1, and its masked analog a2:

我有一个简单的3D数组a1,以及它的掩码模拟a2:

import numpy

a1 = numpy.array([[[ 0.00,  0.00,  0.00],
                   [ 0.88,  0.80,  0.78],
                   [ 0.75,  0.78,  0.77]],

                  [[ 0.00,  0.00,  0.00],
                   [ 3.29,  3.29,  3.30],
                   [ 3.27,  3.27,  3.26]],

                  [[ 0.00,  0.00,  0.00],
                   [ 0.41,  0.42,  0.40],
                   [ 0.42,  0.43,  0.41]]])


a2 = numpy.ma.masked_equal(a1, 0.)

I want to perform the mean of this array along several axes at a time (this is a peculiar, undocumented use of axis argument in numpy.mean, see e.g. here for an example):

我想一次沿着几个轴执行这个数组的平均值(这是在numpy.mean中使用轴参数的一个特殊的,未记录的,参见例如这里的例子):

numpy.mean(a1, axis=(0, 1))

This is working fine with a1, but I get the following error with the masked array a2:

这与a1工作正常,但我得到掩码数组a2的以下错误:

TypeError: tuple indices must be integers, not tuple

And I get the same error with the masked version numpy.ma.mean(a2, axis=(0, 1)), or if I unmask the array through a2[a2.mask]=0.

我使用屏蔽版本numpy.ma.mean(a2,axis =(0,1))得到相同的错误,或者如果我通过a2 [a2.mask] = 0取消屏蔽数组。

I am using a tuple for the axis argument in numpy.mean as it is actually not hardcoded (this command is applied on arrays with potenially different number of dimensions, according to which the tuple is adapted).

我在numpy.mean中使用了一个元组作为axis参数,因为它实际上并不是硬编码的(此命令应用于具有可能不同维数的数组,根据该数组调整元组)。

Problem encountered with numpy version 1.9.1 and 1.9.2.

numpy版本1.9.1和1.9.2遇到问题。

1 个解决方案

#1

For a MaskedArray argument, numpy.mean calls MaskedArray.mean, which doesn't support a tuple axis argument. You can get the correct behavior by reimplementing MaskedArray.mean in terms of operations that do support tuples for axis:

对于MaskedArray参数,numpy.mean调用MaskedArray.mean,它不支持元组轴参数。您可以通过在支持轴元组的操作方面重新实现MaskedArray.mean来获得正确的行为:

def mean(a, axis=None):
    if a.mask is numpy.ma.nomask:
        return super(numpy.ma.MaskedArray, a).mean(axis=axis)

    counts = numpy.logical_not(a.mask).sum(axis=axis)
    if counts.shape:
        sums = a.filled(0).sum(axis=axis)
        mask = (counts == 0)
        return numpy.ma.MaskedArray(data=sums * 1. / counts, mask=mask, copy=False)
    elif counts:
        # Return scalar, not array
        return a.filled(0).sum(axis=axis) * 1. / counts
    else:
        # Masked scalar
        return numpy.ma.masked

or, if you're willing to rely on MaskedArray.sum working with a tuple axis (which you likely are, given that you're using undocumented behavior of numpy.mean),

或者,如果你愿意依赖于使用元组轴的MaskedArray.sum(考虑到你正在使用numpy.mean的未记录行为,你可能会这样做),

def mean(a, axis=None):
    if a.mask is numpy.ma.nomask:
        return super(numpy.ma.MaskedArray, a).mean(axis=axis)

    sums = a2.sum(axis=axis)
    counts = numpy.logical_not(a.mask).sum(axis=axis)
    result = sums * 1. / counts

where we're relying on MaskedArray.sum to handle the mask.

我们依靠MaskedArray.sum来处理掩码。

I have only lightly tested these functions; before using them, make sure they actually work, and write some tests. For example, if the output is 0-dimensional and there are no masked values, whether the output is a 0D MaskedArray or a scalar depends on whether the input mask is nomask or an array of all False. This is the same as the default MaskedArray.mean behavior, but it may not be what you want; I suspect the default behavior is a bug.

我只是轻轻地测试了这些功能;在使用它们之前,确保它们确实有效,并编写一些测试。例如,如果输出是0维并且没有屏蔽值,则输出是0D MaskedArray还是标量取决于输入掩码是否为nomask还是全部为False的数组。这与默认的MaskedArray.mean行为相同,但它可能不是您想要的;我怀疑默认行为是一个错误。

#1