what would be the fastest way to merge a list of numpy arrays into one array if one knows the length of the list and the size of the arrays, which is the same for all?
如果知道列表的长度和数组的大小,那么将numpy数组列表合并到一个数组中的最快方法是什么?
I tried two approaches:
我尝试了两种方法:
-
merged_array = array(list_of_arrays)
from Pythonic way to create a numpy array from a list of numpy arrays andmergethar = array(list_of_arrays)来自Pythonic方式,从numpy数组列表中创建一个numpy数组
-
vstack
vstack
A you can see vstack
is faster, but for some reason the first run takes three times longer than the second. I assume this caused by (missing) preallocation. So how would I preallocate an array for vstack
? Or do you know a faster methode?
你可以看到vstack更快,但由于某种原因,第一次运行比第二次运行长三倍。我假设这是由(缺少)预分配引起的。那么我如何为vstack预分配一个数组呢?或者你知道更快的方法吗?
Thanks!
谢谢!
[UPDATE]
[UPDATE]
I want (25280, 320)
not (80, 320, 320)
which means, merged_array = array(list_of_arrays)
wont work for me. Thanks Joris for pointing that out!!!
我想(25280,320)不是(80,320,320)这意味着,merged_array = array(list_of_arrays)对我不起作用。谢谢Joris指出这个!
Output:
输出:
0.547468900681 s merged_array = array(first_list_of_arrays)
0.547191858292 s merged_array = array(second_list_of_arrays)
0.656183958054 s vstack first
0.236850976944 s vstack second
Code:
码:
import numpy
import time
width = 320
height = 320
n_matrices=80
secondmatrices = list()
for i in range(n_matrices):
temp = numpy.random.rand(height, width).astype(numpy.float32)
secondmatrices.append(numpy.round(temp*9))
firstmatrices = list()
for i in range(n_matrices):
temp = numpy.random.rand(height, width).astype(numpy.float32)
firstmatrices.append(numpy.round(temp*9))
t1 = time.time()
first1=numpy.array(firstmatrices)
print time.time() - t1, "s merged_array = array(first_list_of_arrays)"
t1 = time.time()
second1=numpy.array(secondmatrices)
print time.time() - t1, "s merged_array = array(second_list_of_arrays)"
t1 = time.time()
first2 = firstmatrices.pop()
for i in range(len(firstmatrices)):
first2 = numpy.vstack((firstmatrices.pop(),first2))
print time.time() - t1, "s vstack first"
t1 = time.time()
second2 = secondmatrices.pop()
for i in range(len(secondmatrices)):
second2 = numpy.vstack((secondmatrices.pop(),second2))
print time.time() - t1, "s vstack second"
1 个解决方案
#1
19
You have 80 arrays 320x320? So you probably want to use dstack
:
你有80个阵列320x320?所以你可能想要使用dstack:
first3 = numpy.dstack(firstmatrices)
This returns one 80x320x320 array just like numpy.array(firstmatrices)
does:
这将返回一个80x320x320数组,就像numpy.array(firstmatrices)一样:
timeit numpy.dstack(firstmatrices)
10 loops, best of 3: 47.1 ms per loop
timeit numpy.array(firstmatrices)
1 loops, best of 3: 750 ms per loop
If you want to use vstack
, it will return a 25600x320 array:
如果你想使用vstack,它将返回一个25600x320数组:
timeit numpy.vstack(firstmatrices)
100 loops, best of 3: 18.2 ms per loop
#1
19
You have 80 arrays 320x320? So you probably want to use dstack
:
你有80个阵列320x320?所以你可能想要使用dstack:
first3 = numpy.dstack(firstmatrices)
This returns one 80x320x320 array just like numpy.array(firstmatrices)
does:
这将返回一个80x320x320数组,就像numpy.array(firstmatrices)一样:
timeit numpy.dstack(firstmatrices)
10 loops, best of 3: 47.1 ms per loop
timeit numpy.array(firstmatrices)
1 loops, best of 3: 750 ms per loop
If you want to use vstack
, it will return a 25600x320 array:
如果你想使用vstack,它将返回一个25600x320数组:
timeit numpy.vstack(firstmatrices)
100 loops, best of 3: 18.2 ms per loop