
时间:2020-12-18 12:06:54

This question already has an answer here:


I have a list of numpy arrays. I want to calculate the average of values in these arrays. For example:


import numpy as np
arrays = [np.random.random((4,2)) for _ in range(3)]

How can I have the average of elements in this array?


That is I want the results to be of shape (4,2) where each element is the average of corresponding indices for arrays in the list. I know I can write a for loop to achieve this, but there should be a better numpy way.


2 个解决方案



Use the functional form of np.mean:


>>> import numpy as np
>>> arrays = [np.random.random((4,2)) for _ in range(3)]
>>> np.mean(arrays, axis=0)

This converts your list of arrays to a 3D array of shape (3, 4, 2) and then takes the mean along axis 0.


You can also use Python's sum:


>>> sum(arrays)/len(arrays)

For small lists like your example this is actually faster.


Some timings (m is the length of the list):


m: 3   n:4   k: 2
numpy                 0.01291340 ms
python                0.00295936 ms
m: 10   n:100   k: 100
numpy                 0.14189354 ms
python                0.09465128 ms
m: 1000   n:10   k: 10
numpy                 0.43023768 ms
python                0.45201713 ms

Benchmarking code:

import numpy as np

from timeit import timeit
import types

def setup(m, n, k):
    return list(np.random.random((m, n, k)))

def f_numpy(a):
    return np.mean(a, axis=0)

def f_python(a):
    return sum(a)/len(a)

for args in [(3, 4, 2), (10, 100, 100), (1000, 10, 10)]:
    data = setup(*args)
    print('m: {}   n:{}   k: {}'.format(*args))
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'f(data)', globals={'f':func, 'data':data}, number=1000)))



numpy nanmean will ensure it to work even if some missing values are there in the data:

即使数据中存在一些缺失值,numpy nanmean也会确保它能够正常工作:

np.nanmean(arrays, axis=0)



Use the functional form of np.mean:


>>> import numpy as np
>>> arrays = [np.random.random((4,2)) for _ in range(3)]
>>> np.mean(arrays, axis=0)

This converts your list of arrays to a 3D array of shape (3, 4, 2) and then takes the mean along axis 0.


You can also use Python's sum:


>>> sum(arrays)/len(arrays)

For small lists like your example this is actually faster.


Some timings (m is the length of the list):


m: 3   n:4   k: 2
numpy                 0.01291340 ms
python                0.00295936 ms
m: 10   n:100   k: 100
numpy                 0.14189354 ms
python                0.09465128 ms
m: 1000   n:10   k: 10
numpy                 0.43023768 ms
python                0.45201713 ms

Benchmarking code:

import numpy as np

from timeit import timeit
import types

def setup(m, n, k):
    return list(np.random.random((m, n, k)))

def f_numpy(a):
    return np.mean(a, axis=0)

def f_python(a):
    return sum(a)/len(a)

for args in [(3, 4, 2), (10, 100, 100), (1000, 10, 10)]:
    data = setup(*args)
    print('m: {}   n:{}   k: {}'.format(*args))
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'f(data)', globals={'f':func, 'data':data}, number=1000)))



numpy nanmean will ensure it to work even if some missing values are there in the data:

即使数据中存在一些缺失值,numpy nanmean也会确保它能够正常工作:

np.nanmean(arrays, axis=0)