
时间:2022-07-09 21:41:46

From the documentation on masked arrays in numpy operations on numpy arrays:


The numpy.ma module comes with a specific implementation of most ufuncs. Unary and binary functions that have a validity domain (such as log or divide) return the masked constant whenever the input is masked or falls outside the validity domain: e.g.:


ma.log([-1, 0, 1, 2])
masked_array(data = [-- -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

I have the problem that for my calculations I need to know where those invalid operations were produced. Concretely I would like this instead:


ma.log([-1, 0, 1, 2])
masked_array(data = [np.nan -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

At the risk of this question being conversational my main question is:


What is a good solution to get this masked_array where the computed invalid values (those "fixed" by fix_invalid, like np.nan and np.inf) are not turned into (and conflated with) masked values?


My current solution would be to compute the function on the masked_array.data and then reconstruct the masked array with the original mask. However, I am writing an application which maps arbitrary functions from the user onto many different arrays, some of which are masked and some aren't, and I am looking to avoid a special handler just for masked arrays. Furthermore, these arrays have a distinction between MISSING, NaN, and Inf that is important so I can't just use an array with np.nans instead of masked values.


Additionally, if anyone has any perspective on why this behavior exists I would like to know. It seems strange to have this in the same operation because the validity of results of an operation on unmasked values are really the responsibility of the user, who can choose to then "clean up" by using the fix_invalid function.


Furthermore, if anyone knows anything about the progress of missing values in numpy please share as the oldest posts are from 2011-2012 where there was a debate that never resulted in anything.


EDIT: 2017-10-30

To add to hpaulj's answer; the definition of the log function with a modified domain has side effects on the behavior of the log in the numpy namespace.


In [1]: import numpy as np

In [2]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

In [3]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)

In [4]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

np.log now has the same behavior as mylog, but np.ma.log is unchanged:


In [5]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

Is there a way to avoid this?


Using Python 3.6.2 :: Anaconda custom (64-bit) and numpy 1.12.1

使用Python 3.6.2 :: Anaconda自定义(64位)和numpy 1.12.1

1 个解决方案



Just clarify what appears to be going on here


np.ma.log runs np.log on the argument, but it traps the Warnings:


In [26]: np.log([-1,0,1,2])
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in log
Out[26]: array([        nan,        -inf,  0.        ,  0.69314718])

It masks the nan and -inf values. And apparently copies the original values into these data slots:


In [27]: np.ma.log([-1,0,1,2])
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [28]: _.data
Out[28]: array([-1.        ,  0.        ,  0.        ,  0.69314718])

(running in Py3; numpy version 1.13.1)

(在Py3中运行; numpy版本1.13.1)

This masking behavior is not unique to ma.log. It is determined by its class


In [41]: type(np.ma.log)
Out[41]: numpy.ma.core._MaskedUnaryOperation

In np.ma.core it is defined with fill and domain attributes:


log = _MaskedUnaryOperation(umath.log, 1.0,

So the valid domain (unmasked) is >0:

所以有效域(未屏蔽)> 0:

In [47]: np.ma.log.domain([-1,0,1,2])
Out[47]: array([ True,  True, False, False], dtype=bool)

that domain mask is or-ed with

该域掩码是or-ed with

In [54]: ~np.isfinite(np.log([-1,0,1,2]))
Out[54]: array([ True,  True, False, False], dtype=bool)

which has the same values.


Looks like I could define a custom log that does not add its own domain masking:


In [58]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)
In [59]: mylog([-1,0,1,2])
masked_array(data = [        nan        -inf  0.          0.69314718],
             mask = False,
       fill_value = 1e+20)

In [63]: np.ma.masked_array([-1,0,1,2],[1,0,0,0])
masked_array(data = [-- 0 1 2],
             mask = [ True False False False],
       fill_value = 999999)
In [64]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [65]: mylog(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)



Just clarify what appears to be going on here


np.ma.log runs np.log on the argument, but it traps the Warnings:


In [26]: np.log([-1,0,1,2])
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in log
Out[26]: array([        nan,        -inf,  0.        ,  0.69314718])

It masks the nan and -inf values. And apparently copies the original values into these data slots:


In [27]: np.ma.log([-1,0,1,2])
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [28]: _.data
Out[28]: array([-1.        ,  0.        ,  0.        ,  0.69314718])

(running in Py3; numpy version 1.13.1)

(在Py3中运行; numpy版本1.13.1)

This masking behavior is not unique to ma.log. It is determined by its class


In [41]: type(np.ma.log)
Out[41]: numpy.ma.core._MaskedUnaryOperation

In np.ma.core it is defined with fill and domain attributes:


log = _MaskedUnaryOperation(umath.log, 1.0,

So the valid domain (unmasked) is >0:

所以有效域(未屏蔽)> 0:

In [47]: np.ma.log.domain([-1,0,1,2])
Out[47]: array([ True,  True, False, False], dtype=bool)

that domain mask is or-ed with

该域掩码是or-ed with

In [54]: ~np.isfinite(np.log([-1,0,1,2]))
Out[54]: array([ True,  True, False, False], dtype=bool)

which has the same values.


Looks like I could define a custom log that does not add its own domain masking:


In [58]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)
In [59]: mylog([-1,0,1,2])
masked_array(data = [        nan        -inf  0.          0.69314718],
             mask = False,
       fill_value = 1e+20)

In [63]: np.ma.masked_array([-1,0,1,2],[1,0,0,0])
masked_array(data = [-- 0 1 2],
             mask = [ True False False False],
       fill_value = 999999)
In [64]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [65]: mylog(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)