对numpy蒙板数组的操作会使屏蔽的值无效

时间:2022-07-09 21:41:46

From the documentation on masked arrays in numpy operations on numpy arrays:

从numpy数组中numpy操作中的掩码数组文档:

The numpy.ma module comes with a specific implementation of most ufuncs. Unary and binary functions that have a validity domain (such as log or divide) return the masked constant whenever the input is masked or falls outside the validity domain: e.g.:

numpy.ma模块附带了大多数ufunc的特定实现。只要输入被屏蔽或超出有效域,具有有效域(例如log或divide)的一元和二元函数就会返回屏蔽常量:例如:

ma.log([-1, 0, 1, 2])
masked_array(data = [-- -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

I have the problem that for my calculations I need to know where those invalid operations were produced. Concretely I would like this instead:

我有一个问题,就我的计算而言,我需要知道那些无效操作的产生位置。具体而言,我想这样:

ma.log([-1, 0, 1, 2])
masked_array(data = [np.nan -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

At the risk of this question being conversational my main question is:

冒这个问题是对话的风险我的主要问题是:

What is a good solution to get this masked_array where the computed invalid values (those "fixed" by fix_invalid, like np.nan and np.inf) are not turned into (and conflated with) masked values?

获得这个masked_array的好方法是什么,计算出的无效值(由fix_invalid“修复”,如np.nan和np.inf)不会变成(并与屏蔽值混淆)?

My current solution would be to compute the function on the masked_array.data and then reconstruct the masked array with the original mask. However, I am writing an application which maps arbitrary functions from the user onto many different arrays, some of which are masked and some aren't, and I am looking to avoid a special handler just for masked arrays. Furthermore, these arrays have a distinction between MISSING, NaN, and Inf that is important so I can't just use an array with np.nans instead of masked values.

我目前的解决方案是计算masked_array.data上的函数,然后用原始掩码重建掩码数组。但是,我正在编写一个应用程序,它将用户的任意函数映射到许多不同的数组,其中一些被屏蔽而另一些则没有,我希望避免一个特殊的处理程序,仅用于屏蔽数组。此外,这些数组在MISSING,NaN和Inf之间有区别,这很重要,因此我不能只使用带有np.nans的数组而不是屏蔽值。


Additionally, if anyone has any perspective on why this behavior exists I would like to know. It seems strange to have this in the same operation because the validity of results of an operation on unmasked values are really the responsibility of the user, who can choose to then "clean up" by using the fix_invalid function.

此外,如果有人对这种行为存在的原因有任何看法,我想知道。在同一操作中使用它似乎很奇怪,因为对未屏蔽值的操作结果的有效性实际上是用户的责任,用户可以选择使用fix_invalid函数“清理”。

Furthermore, if anyone knows anything about the progress of missing values in numpy please share as the oldest posts are from 2011-2012 where there was a debate that never resulted in anything.

此外,如果有人知道numpy中缺失值的进展,请分享,因为最早的帖子是从2011年至2012年,那里的辩论从未产生任何结果。


EDIT: 2017-10-30

To add to hpaulj's answer; the definition of the log function with a modified domain has side effects on the behavior of the log in the numpy namespace.

添加到hpaulj的答案;使用修改的域定义日志函数会对numpy命名空间中的日志行为产生副作用。

In [1]: import numpy as np

In [2]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/home/salotz/anaconda3/bin/python
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/home/salotz/anaconda3/bin/python
Out[2]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

In [3]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)

In [4]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/home/salotz/anaconda3/bin/python
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/home/salotz/anaconda3/bin/python
Out[4]: 
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

np.log now has the same behavior as mylog, but np.ma.log is unchanged:

np.log现在与mylog具有相同的行为,但是np.ma.log没有改变:

In [5]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[5]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

Is there a way to avoid this?

有没有办法避免这种情况?

Using Python 3.6.2 :: Anaconda custom (64-bit) and numpy 1.12.1

使用Python 3.6.2 :: Anaconda自定义(64位)和numpy 1.12.1

1 个解决方案

#1


3  

Just clarify what appears to be going on here

只是澄清一下这里似乎发生了什么

np.ma.log runs np.log on the argument, but it traps the Warnings:

np.ma.log在参数上运行np.log,但它会捕获警告:

In [26]: np.log([-1,0,1,2])
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log
  #!/usr/bin/python3
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in log
  #!/usr/bin/python3
Out[26]: array([        nan,        -inf,  0.        ,  0.69314718])

It masks the nan and -inf values. And apparently copies the original values into these data slots:

它掩盖了nan和-inf值。并且显然将原始值复制到这些数据槽中:

In [27]: np.ma.log([-1,0,1,2])
Out[27]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [28]: _.data
Out[28]: array([-1.        ,  0.        ,  0.        ,  0.69314718])

(running in Py3; numpy version 1.13.1)

(在Py3中运行; numpy版本1.13.1)

This masking behavior is not unique to ma.log. It is determined by its class

这种屏蔽行为并非ma.log独有。这是由它的班级决定的

In [41]: type(np.ma.log)
Out[41]: numpy.ma.core._MaskedUnaryOperation

In np.ma.core it is defined with fill and domain attributes:

在np.ma.core中,它使用fill和domain属性定义:

log = _MaskedUnaryOperation(umath.log, 1.0,
                        _DomainGreater(0.0))

So the valid domain (unmasked) is >0:

所以有效域(未屏蔽)> 0:

In [47]: np.ma.log.domain([-1,0,1,2])
Out[47]: array([ True,  True, False, False], dtype=bool)

that domain mask is or-ed with

该域掩码是or-ed with

In [54]: ~np.isfinite(np.log([-1,0,1,2]))
...
Out[54]: array([ True,  True, False, False], dtype=bool)

which has the same values.

它具有相同的值。

Looks like I could define a custom log that does not add its own domain masking:

看起来我可以定义一个不添加自己的域掩码的自定义日志:

In [58]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)
In [59]: mylog([-1,0,1,2])
Out[59]: 
masked_array(data = [        nan        -inf  0.          0.69314718],
             mask = False,
       fill_value = 1e+20)

In [63]: np.ma.masked_array([-1,0,1,2],[1,0,0,0])
Out[63]: 
masked_array(data = [-- 0 1 2],
             mask = [ True False False False],
       fill_value = 999999)
In [64]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[64]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [65]: mylog(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[65]: 
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

#1


3  

Just clarify what appears to be going on here

只是澄清一下这里似乎发生了什么

np.ma.log runs np.log on the argument, but it traps the Warnings:

np.ma.log在参数上运行np.log,但它会捕获警告:

In [26]: np.log([-1,0,1,2])
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log
  #!/usr/bin/python3
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in log
  #!/usr/bin/python3
Out[26]: array([        nan,        -inf,  0.        ,  0.69314718])

It masks the nan and -inf values. And apparently copies the original values into these data slots:

它掩盖了nan和-inf值。并且显然将原始值复制到这些数据槽中:

In [27]: np.ma.log([-1,0,1,2])
Out[27]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [28]: _.data
Out[28]: array([-1.        ,  0.        ,  0.        ,  0.69314718])

(running in Py3; numpy version 1.13.1)

(在Py3中运行; numpy版本1.13.1)

This masking behavior is not unique to ma.log. It is determined by its class

这种屏蔽行为并非ma.log独有。这是由它的班级决定的

In [41]: type(np.ma.log)
Out[41]: numpy.ma.core._MaskedUnaryOperation

In np.ma.core it is defined with fill and domain attributes:

在np.ma.core中,它使用fill和domain属性定义:

log = _MaskedUnaryOperation(umath.log, 1.0,
                        _DomainGreater(0.0))

So the valid domain (unmasked) is >0:

所以有效域(未屏蔽)> 0:

In [47]: np.ma.log.domain([-1,0,1,2])
Out[47]: array([ True,  True, False, False], dtype=bool)

that domain mask is or-ed with

该域掩码是or-ed with

In [54]: ~np.isfinite(np.log([-1,0,1,2]))
...
Out[54]: array([ True,  True, False, False], dtype=bool)

which has the same values.

它具有相同的值。

Looks like I could define a custom log that does not add its own domain masking:

看起来我可以定义一个不添加自己的域掩码的自定义日志:

In [58]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)
In [59]: mylog([-1,0,1,2])
Out[59]: 
masked_array(data = [        nan        -inf  0.          0.69314718],
             mask = False,
       fill_value = 1e+20)

In [63]: np.ma.masked_array([-1,0,1,2],[1,0,0,0])
Out[63]: 
masked_array(data = [-- 0 1 2],
             mask = [ True False False False],
       fill_value = 999999)
In [64]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[64]: 
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [65]: mylog(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[65]: 
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)