具有相同形状的numpy数组列表中值的元素聚合（平均值）[重复]

This question already has an answer here:

这个问题在这里已有答案:

Numpy mean and std over every terms of arrays 1 answer

Numpy对数组的每个术语的意思和标准1回答

I have a list of numpy arrays. I want to calculate the average of values in these arrays. For example:

我有一个numpy数组列表。我想计算这些数组中的平均值。例如:

import numpy as np
arrays = [np.random.random((4,2)) for _ in range(3)]

How can I have the average of elements in this array?

如何获得此数组中元素的平均值?

That is I want the results to be of shape (4,2) where each element is the average of corresponding indices for arrays in the list. I know I can write a for loop to achieve this, but there should be a better numpy way.

那就是我希望结果具有形状(4,2),其中每个元素是列表中数组的相应索引的平均值。我知道我可以写一个for循环来实现这个目标,但应该有一个更好的numpy方式。

2 个解决方案

#1

Use the functional form of np.mean:

使用np.mean的功能形式:

>>> import numpy as np
>>> arrays = [np.random.random((4,2)) for _ in range(3)]
>>> np.mean(arrays, axis=0)

This converts your list of arrays to a 3D array of shape (3, 4, 2) and then takes the mean along axis 0.

这会将您的数组列表转换为形状(3,4,2)的3D数组,然后沿轴0取平均值。

You can also use Python's sum:

你也可以使用Python的总和:

>>> sum(arrays)/len(arrays)

For small lists like your example this is actually faster.

对于像您的示例这样的小列表,这实际上更快。

Some timings (m is the length of the list):

一些时间(m是列表的长度):

m: 3   n:4   k: 2
numpy                 0.01291340 ms
python                0.00295936 ms
m: 10   n:100   k: 100
numpy                 0.14189354 ms
python                0.09465128 ms
m: 1000   n:10   k: 10
numpy                 0.43023768 ms
python                0.45201713 ms

Benchmarking code:

import numpy as np

from timeit import timeit
import types

def setup(m, n, k):
    return list(np.random.random((m, n, k)))

def f_numpy(a):
    return np.mean(a, axis=0)

def f_python(a):
    return sum(a)/len(a)

for args in [(3, 4, 2), (10, 100, 100), (1000, 10, 10)]:
    data = setup(*args)
    print('m: {}   n:{}   k: {}'.format(*args))
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'f(data)', globals={'f':func, 'data':data}, number=1000)))

#2

numpy nanmean will ensure it to work even if some missing values are there in the data:

即使数据中存在一些缺失值,numpy nanmean也会确保它能够正常工作:

np.nanmean(arrays, axis=0)

#1