This question already has an answer here:
这个问题在这里已有答案:
- Numpy mean and std over every terms of arrays 1 answer
Numpy对数组的每个术语的意思和标准1回答
I have a list of numpy arrays. I want to calculate the average of values in these arrays. For example:
我有一个numpy数组列表。我想计算这些数组中的平均值。例如:
import numpy as np
arrays = [np.random.random((4,2)) for _ in range(3)]
How can I have the average of elements in this array?
如何获得此数组中元素的平均值?
That is I want the results to be of shape (4,2)
where each element is the average of corresponding indices for arrays in the list. I know I can write a for loop to achieve this, but there should be a better numpy way.
那就是我希望结果具有形状(4,2),其中每个元素是列表中数组的相应索引的平均值。我知道我可以写一个for循环来实现这个目标,但应该有一个更好的numpy方式。
2 个解决方案
#1
1
Use the functional form of np.mean
:
使用np.mean的功能形式:
>>> import numpy as np
>>> arrays = [np.random.random((4,2)) for _ in range(3)]
>>> np.mean(arrays, axis=0)
This converts your list of arrays to a 3D array of shape (3, 4, 2)
and then takes the mean along axis 0
.
这会将您的数组列表转换为形状(3,4,2)的3D数组,然后沿轴0取平均值。
You can also use Python's sum
:
你也可以使用Python的总和:
>>> sum(arrays)/len(arrays)
For small lists like your example this is actually faster.
对于像您的示例这样的小列表,这实际上更快。
Some timings (m
is the length of the list):
一些时间(m是列表的长度):
m: 3 n:4 k: 2
numpy 0.01291340 ms
python 0.00295936 ms
m: 10 n:100 k: 100
numpy 0.14189354 ms
python 0.09465128 ms
m: 1000 n:10 k: 10
numpy 0.43023768 ms
python 0.45201713 ms
Benchmarking code:
import numpy as np
from timeit import timeit
import types
def setup(m, n, k):
return list(np.random.random((m, n, k)))
def f_numpy(a):
return np.mean(a, axis=0)
def f_python(a):
return sum(a)/len(a)
for args in [(3, 4, 2), (10, 100, 100), (1000, 10, 10)]:
data = setup(*args)
print('m: {} n:{} k: {}'.format(*args))
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(data)', globals={'f':func, 'data':data}, number=1000)))
#2
0
numpy nanmean
will ensure it to work even if some missing values are there in the data:
即使数据中存在一些缺失值,numpy nanmean也会确保它能够正常工作:
np.nanmean(arrays, axis=0)
#1
1
Use the functional form of np.mean
:
使用np.mean的功能形式:
>>> import numpy as np
>>> arrays = [np.random.random((4,2)) for _ in range(3)]
>>> np.mean(arrays, axis=0)
This converts your list of arrays to a 3D array of shape (3, 4, 2)
and then takes the mean along axis 0
.
这会将您的数组列表转换为形状(3,4,2)的3D数组,然后沿轴0取平均值。
You can also use Python's sum
:
你也可以使用Python的总和:
>>> sum(arrays)/len(arrays)
For small lists like your example this is actually faster.
对于像您的示例这样的小列表,这实际上更快。
Some timings (m
is the length of the list):
一些时间(m是列表的长度):
m: 3 n:4 k: 2
numpy 0.01291340 ms
python 0.00295936 ms
m: 10 n:100 k: 100
numpy 0.14189354 ms
python 0.09465128 ms
m: 1000 n:10 k: 10
numpy 0.43023768 ms
python 0.45201713 ms
Benchmarking code:
import numpy as np
from timeit import timeit
import types
def setup(m, n, k):
return list(np.random.random((m, n, k)))
def f_numpy(a):
return np.mean(a, axis=0)
def f_python(a):
return sum(a)/len(a)
for args in [(3, 4, 2), (10, 100, 100), (1000, 10, 10)]:
data = setup(*args)
print('m: {} n:{} k: {}'.format(*args))
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(data)', globals={'f':func, 'data':data}, number=1000)))
#2
0
numpy nanmean
will ensure it to work even if some missing values are there in the data:
即使数据中存在一些缺失值,numpy nanmean也会确保它能够正常工作:
np.nanmean(arrays, axis=0)