import numpy as np
import numpy.ma as ma
"""This operates as expected with one value masked"""
a = [0., 1., 1.e20, 9.]
error_value = 1.e20
b = ma.masked_values(a, error_value)
print b
"""This does not, all values are masked """
d = [0., 1., 'NA', 9.]
error_value = 'NA'
e = ma.masked_values(d, error_value)
print e
How can I use 'nan', 'NA', 'None', or some similar value to indicate missing data?
如何使用'nan','NA','None'或类似的值来表示缺少数据?
2 个解决方案
#1
4
Are you getting your data from a text file or similar? If so, I'd suggest using the genfromtxt
function directly to specify your masked value:
您是从文本文件或类似文件中获取数据的吗?如果是这样,我建议直接使用genfromtxt函数来指定你的掩码值:
In [149]: f = StringIO('0.0, 1.0, NA, 9.0')
In [150]: a = np.genfromtxt(f, delimiter=',', missing_values='NA', usemask=True)
In [151]: a
Out[151]:
masked_array(data = [0.0 1.0 -- 9.0],
mask = [False False True False],
fill_value = 1e+20)
I think the problem in your example is that the python list you're using to initialize the numpy array has heterogeneous types (floats and a string). The values are coerced to a strings in a numpy array, but the masked_values
function uses floating point equality yielding the strange results.
我认为你的例子中的问题是你用来初始化numpy数组的python列表有异构类型(浮点数和字符串)。值被强制转换为numpy数组中的字符串,但masked_values函数使用浮点相等,产生奇怪的结果。
Here's one way to overcome this by creating an array with object dtype:
这是通过创建具有对象dtype的数组来克服此问题的一种方法:
In [152]: d = np.array([0., 1., 'NA', 9.], dtype=object)
In [153]: e = ma.masked_values(d, 'NA')
In [154]: e
Out[154]:
masked_array(data = [0.0 1.0 -- 9.0],
mask = [False False True False],
fill_value = ?)
You may prefer the first solution since the result has a float dtype.
您可能更喜欢第一个解决方案,因为结果具有float dtype。
#2
0
This solution works, it does force the creation of a copy of the array.
这个解决方案有效,它确实强制创建数组的副本。
a_true = (a == 'NA')
a[a_true] = 1.e20
a = a.astype(float)
print a
error_value = 1.e20
b = ma.masked_values(a, error_value)
print b
#1
4
Are you getting your data from a text file or similar? If so, I'd suggest using the genfromtxt
function directly to specify your masked value:
您是从文本文件或类似文件中获取数据的吗?如果是这样,我建议直接使用genfromtxt函数来指定你的掩码值:
In [149]: f = StringIO('0.0, 1.0, NA, 9.0')
In [150]: a = np.genfromtxt(f, delimiter=',', missing_values='NA', usemask=True)
In [151]: a
Out[151]:
masked_array(data = [0.0 1.0 -- 9.0],
mask = [False False True False],
fill_value = 1e+20)
I think the problem in your example is that the python list you're using to initialize the numpy array has heterogeneous types (floats and a string). The values are coerced to a strings in a numpy array, but the masked_values
function uses floating point equality yielding the strange results.
我认为你的例子中的问题是你用来初始化numpy数组的python列表有异构类型(浮点数和字符串)。值被强制转换为numpy数组中的字符串,但masked_values函数使用浮点相等,产生奇怪的结果。
Here's one way to overcome this by creating an array with object dtype:
这是通过创建具有对象dtype的数组来克服此问题的一种方法:
In [152]: d = np.array([0., 1., 'NA', 9.], dtype=object)
In [153]: e = ma.masked_values(d, 'NA')
In [154]: e
Out[154]:
masked_array(data = [0.0 1.0 -- 9.0],
mask = [False False True False],
fill_value = ?)
You may prefer the first solution since the result has a float dtype.
您可能更喜欢第一个解决方案,因为结果具有float dtype。
#2
0
This solution works, it does force the creation of a copy of the array.
这个解决方案有效,它确实强制创建数组的副本。
a_true = (a == 'NA')
a[a_true] = 1.e20
a = a.astype(float)
print a
error_value = 1.e20
b = ma.masked_values(a, error_value)
print b