Numpy:“锯齿”3D数组中一个维度的平均值

Suppose I have an N*M*X-dimensional array "data", where N and M are fixed, but X is variable for each entry data[n][m].

假设我有一个N*M*X维数组“data”，其中N和M是固定的，但是X是每个条目数据的变量[N][M]。

(Edit: To clarify, I just used np.array() on the 3D python list which I used for reading in the data, so the numpy array is of dimensions N*M and its entries are variable-length lists)

(编辑:为了澄清，我只是在数据中读取的3D python列表中使用了np.array()，所以numpy数组的维度为N*M，其条目为可变长度列表)

I'd now like to compute the average over the X-dimension, so that I'm left with an N*M-dimensional array. Using np.average/mean with the axis-argument doesn't work, so the way I'm doing it right now is just iterating over N and M and appending the manually computed average to a new list, but that just doesn't feel very "python":

现在我想要计算x维的平均值，这样就剩下一个N* m维的数组了。使用np。使用axis-参数的平均/平均值不工作，所以我现在所做的方法只是迭代N和M，并将手工计算的平均值附加到一个新列表中，但这并不是非常“python”:

avgData=[]
for n in data:
    temp=[]
    for m in n:
        temp.append(np.average(m))
    avgData.append(temp)

Am I missing something obvious here? I'm trying to freshen up my python skills while I'm at it, so interesting/varied responses are more than welcome! :)

我漏掉了什么明显的东西吗?我正在努力更新我的python技能，所以有趣的/多样的回答是不受欢迎的!:)

Thanks!

谢谢!

2 个解决方案

#1

What about using np.vectorize:

关于使用np.vectorize:

do_avg = np.vectorize(np.average)
data_2d = do_avg(data)

#2

data = np.array([[1,2,3],[0,3,2,4],[0,2],[1]]).reshape(2,2)
avg=np.zeros(data.shape)
avg.flat=[np.average(x) for x in data.flat]
print avg
#array([[ 2.  ,  2.25],
#       [ 1.  ,  1.  ]])

This still iterates over the elements of data (nothing un-Pythonic about that). But since there's nothing special about the shape or axes of data, I'm just using data.flat. While appending to Python list, with numpy it is better to assign values to the elements of an existing array.

这仍然遍历数据元素(没有什么是非python的)。但是由于数据的形状和轴没有什么特殊之处，所以我只使用data.flat。当附加到Python列表时，使用numpy时，最好为现有数组的元素赋值。

There are fast numeric methods to work with numpy arrays, but most (if not all) work with simple numeric dtypes. Here the array elements are object (either list or array), numpy has to resort to the usual Python iteration and list operations.

使用numpy数组有快速的数字方法，但大多数(如果不是全部)都使用简单的数字类型。在这里，数组元素是对象(列表或数组)，numpy必须采用通常的Python迭代和列表操作。

For this small example, this solution is a bit faster than Zwicker's vectorize. For larger data the two solutions take about the same time.

对于这个小示例，这个解决方案比Zwicker的矢量化要快一些。对于较大的数据，这两个解决方案花费的时间大约相同。

#1