如何在Numpy中创建具有蒙版值的数组的直方图?

时间:2022-07-19 12:07:42

In Numpy 1.4.1, what is the simplest or most efficient way of calculating the histogram of a masked array? numpy.histogram and pyplot.hist do count the masked elements, by default!

在Numpy 1.4.1中,计算掩码数组直方图的最简单或最有效的方法是什么?默认情况下,numpy.histogram和pyplot.hist会计算蒙面元素!

The only simple solution I can think of right now involves creating a new array with the non-masked value:

我现在能想到的唯一简单解决方案是使用非屏蔽值创建一个新数组:

histogram(m_arr[~m_arr.mask])

This is not very efficient, though, as this unnecessarily creates a new array. I'd be happy to read about better ideas!

但是,这不是很有效,因为这会不必要地创建一个新数组。我很乐意阅读更好的想法!

3 个解决方案

#1


13  

(Undeleting this as per discussion above...)

(根据上面的讨论取消删除...)

I'm not sure whether or not the numpy developers would consider this a bug or expected behavior. I asked on the mailing list, so I guess we'll see what they say.

我不确定numpy开发人员是否会认为这是一个错误或预期的行为。我在邮件列表上问过,所以我想我们会看到他们说的话。

Either way, it's an easy fix. Patching numpy/lib/function_base.py to use numpy.asanyarray rather than numpy.asarray on the inputs to the function will allow it to properly use masked arrays (or any other subclass of an ndarray) without creating a copy.

无论哪种方式,这都是一个简单的解决方案。修补numpy / lib / function_base.py以在函数的输入上使用numpy.asanyarray而不是numpy.asarray将允许它正确使用掩码数组(或ndarray的任何其他子类)而无需创建副本。

Edit: It seems like it is expected behavior. As discussed here:

编辑:似乎是预期的行为。如下所述:

If you want to ignore masked data it's just on extra function call

如果你想忽略屏蔽数据,它只是在额外的函数调用上

histogram(m_arr.compressed())

直方图(m_arr.compressed())

I don't think the fact that this makes an extra copy will be relevant, because I guess full masked array handling inside histogram will be a lot more expensive.

我不认为这会产生额外的副本这一事实是相关的,因为我认为直方图中的完全掩码数组处理将会更加昂贵。

Using asanyarray would also allow matrices in and other subtypes that might not be handled correctly by the histogram calculations.

使用asanyarray还可以允许直方图计算中可能无法正确处理的矩阵和其他子类型。

For anything else besides dropping masked observations, it would be necessary to figure out what the masked array definition of a histogram is, as Bruce pointed out.

除了删除掩盖的观察之外,还有必要弄清楚直方图的掩码数组定义是什么,正如布鲁斯指出的那样。

#2


7  

Try hist(m_arr.compressed()).

尝试hist(m_arr.compressed())。

#3


3  

This is a super old question, but these days I just use:

这是一个超级老问题,但这些天我只是使用:

numpy.histogram(m_arr, bins=.., range=.., density=False, weights=m_arr_mask)

numpy.histogram(m_arr,bins = ..,range = ..,density = False,weights = m_arr_mask)

Where m_arr_mask is an array with the same shape as m_arr, consisting of 0 values for elements of m_arr to be excluded from the histogram and 1 values for elements that are to be included.

其中m_arr_mask是一个与m_arr形状相同的数组,由要从直方图中排除的m_arr元素的0个值和要包含的元素的1个值组成。

#1


13  

(Undeleting this as per discussion above...)

(根据上面的讨论取消删除...)

I'm not sure whether or not the numpy developers would consider this a bug or expected behavior. I asked on the mailing list, so I guess we'll see what they say.

我不确定numpy开发人员是否会认为这是一个错误或预期的行为。我在邮件列表上问过,所以我想我们会看到他们说的话。

Either way, it's an easy fix. Patching numpy/lib/function_base.py to use numpy.asanyarray rather than numpy.asarray on the inputs to the function will allow it to properly use masked arrays (or any other subclass of an ndarray) without creating a copy.

无论哪种方式,这都是一个简单的解决方案。修补numpy / lib / function_base.py以在函数的输入上使用numpy.asanyarray而不是numpy.asarray将允许它正确使用掩码数组(或ndarray的任何其他子类)而无需创建副本。

Edit: It seems like it is expected behavior. As discussed here:

编辑:似乎是预期的行为。如下所述:

If you want to ignore masked data it's just on extra function call

如果你想忽略屏蔽数据,它只是在额外的函数调用上

histogram(m_arr.compressed())

直方图(m_arr.compressed())

I don't think the fact that this makes an extra copy will be relevant, because I guess full masked array handling inside histogram will be a lot more expensive.

我不认为这会产生额外的副本这一事实是相关的,因为我认为直方图中的完全掩码数组处理将会更加昂贵。

Using asanyarray would also allow matrices in and other subtypes that might not be handled correctly by the histogram calculations.

使用asanyarray还可以允许直方图计算中可能无法正确处理的矩阵和其他子类型。

For anything else besides dropping masked observations, it would be necessary to figure out what the masked array definition of a histogram is, as Bruce pointed out.

除了删除掩盖的观察之外,还有必要弄清楚直方图的掩码数组定义是什么,正如布鲁斯指出的那样。

#2


7  

Try hist(m_arr.compressed()).

尝试hist(m_arr.compressed())。

#3


3  

This is a super old question, but these days I just use:

这是一个超级老问题,但这些天我只是使用:

numpy.histogram(m_arr, bins=.., range=.., density=False, weights=m_arr_mask)

numpy.histogram(m_arr,bins = ..,range = ..,density = False,weights = m_arr_mask)

Where m_arr_mask is an array with the same shape as m_arr, consisting of 0 values for elements of m_arr to be excluded from the histogram and 1 values for elements that are to be included.

其中m_arr_mask是一个与m_arr形状相同的数组,由要从直方图中排除的m_arr元素的0个值和要包含的元素的1个值组成。