如何获得嵌套numpy数组的每个相应列的所有平均值?

时间:2022-07-19 12:07:48

I am having trouble performing a column-wise operation for each column of a dim-2 numpy array. I am trying to adapt my case to this answer, though my setup is different. My actual dataset is quite large and involves multiple resamplings, hence the syntax of the example below. If the code and explanation looks too long, consider skipping ahead to the header Relevant.

我在为dim-2 numpy数组的每一列执行逐列操作时遇到问题。我试图让我的情况适应这个答案,尽管我的设置不同。我的实际数据集非常大,涉及多次重采样,因此下面的示例的语法。如果代码和解释看起来太长,请考虑跳到标题相关。

Skippable (Only here to reproduce zs below)

可跳过(仅限此处重现zs)

Consider an (x_n, y_n) dataset where n = 0, 1, or 2.

考虑一个(x_n,y_n)数据集,其中n = 0,1或2。

def get_xy(num, size=10):
    ## (x1, y1), (x2, y2), (x3, y3) where xi, yi are both arrays
    if num == 0:
        x = np.linspace(7, size+6, size)
        y = np.linspace(3, size+2, size)
    elif num == 1:
        x = np.linspace(5, size+4, size)
        y = np.linspace(2, size+1, size)
    elif num == 2:
        x = np.linspace(4, size+3, size)
        y = np.linspace(1, size, size)
    return x, y

Suppose we can calculate some metric z_n given arrays x_n and y_n.

假设我们可以在给定数组x_n和y_n的情况下计算一些度量z_n。

def get_single_z(x, y, constant=2):
    deltas = [x[i] - y[i] for i in range(len(x)) if len(x) == len(y)]
    return constant * np.array(deltas)

Instead of calculating each z_n individually, we can calculate all z_n's at once.

我们可以一次计算所有z_n,而不是单独计算每个z_n。

def get_all_z(constant=2):
    zs = []
    for num in range(3): ## 0, 1, 2
        xs, ys = get_xy(num)
        zs.append(get_single_z(xs, ys, constant))
    zs = np.array(zs)
    return zs

Relevant:

zs = get_all_z()
print(zs)
>> [[ 8.  8.  8.  8.  8.  8.  8.  8.  8.  8.]
    [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]
    [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]]

For my purpose, I'd like to make a new list or array vs for which the value at each index is equal to the average of the values in the corresponding columns of zs. For this case, every element of vs would be identical (since each operation would be the average of [8, 6, 6]). But had the first element of the first sub-array been a 10 instead of an 8, then the first element of vs would be the average of [10, 6, 6].

为了我的目的,我想创建一个新的列表或数组vs,其中每个索引的值等于zs的相应列中的值的平均值。对于这种情况,vs的每个元素都是相同的(因为每个操作都是[8,6,6]的平均值)。但如果第一个子阵列的第一个元素是10而不是8,那么vs的第一个元素将是[10,6,6]的平均值。

Unsuccessful Attempt:

def get_avg_per_col(z):
    ## column ?= axis number
    return [np.mean(z, axis=i) for i in range(len(zs[0]))]

print(get_avg_per_col(zs))
Traceback (most recent call last):...
...line 50, in _count_reduce_items ## of numpy code, not my code
    items *= arr.shape[ax]
IndexError: tuple index out of range

1 个解决方案

#1


2  

You can use np.mean on the transposed zs to get the column wise mean.

您可以在转置的zs上使用np.mean来获得列方式。

In [49]: import numpy as np

In [53]: zs = np.array([[ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
    ...:  [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
    ...:  [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.]])

In [54]: np.mean(zs.T, axis=1)
Out[54]: 
array([ 6.66666667,  6.66666667,  6.66666667,  6.66666667,  6.66666667,
        6.66666667,  6.66666667,  6.66666667,  6.66666667,  6.66666667])

#1


2  

You can use np.mean on the transposed zs to get the column wise mean.

您可以在转置的zs上使用np.mean来获得列方式。

In [49]: import numpy as np

In [53]: zs = np.array([[ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
    ...:  [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
    ...:  [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.]])

In [54]: np.mean(zs.T, axis=1)
Out[54]: 
array([ 6.66666667,  6.66666667,  6.66666667,  6.66666667,  6.66666667,
        6.66666667,  6.66666667,  6.66666667,  6.66666667,  6.66666667])