如果知道列表的长度和数组的大小,那么将numpy数组的列表合并为一个数组的最快方法是什么,这对所有人都是一样的?
我尝试了两种方法:
来自Pythonic way to create a numpy array from a list of numpy arrays and
vstack
的
merged_array = array(list_of_arrays)
A你可以看到vstack
更快,但由于某些原因,第一次运行花费的时间是第二次的三倍。我认为这是由(缺少) preallocation引起的。那么,我如何为vstack
预分配数组呢?或者你知道一个更快的方法吗?
谢谢!
更新
我想要的是(25280, 320)
(80, 320, 320)
而不是 merged_array = array(list_of_arrays)
,也就是说,不适合我。感谢Joris指出这一点!
输出:
0.547468900681 s merged_array = array(first_list_of_arrays)
0.547191858292 s merged_array = array(second_list_of_arrays)
0.656183958054 s vstack first
0.236850976944 s vstack second
代码:
import numpy
import time
width = 320
height = 320
n_matrices=80
secondmatrices = list()
for i in range(n_matrices):
temp = numpy.random.rand(height, width).astype(numpy.float32)
secondmatrices.append(numpy.round(temp*9))
firstmatrices = list()
for i in range(n_matrices):
temp = numpy.random.rand(height, width).astype(numpy.float32)
firstmatrices.append(numpy.round(temp*9))
t1 = time.time()
first1=numpy.array(firstmatrices)
print time.time() - t1, "s merged_array = array(first_list_of_arrays)"
t1 = time.time()
second1=numpy.array(secondmatrices)
print time.time() - t1, "s merged_array = array(second_list_of_arrays)"
t1 = time.time()
first2 = firstmatrices.pop()
for i in range(len(firstmatrices)):
first2 = numpy.vstack((firstmatrices.pop(),first2))
print time.time() - t1, "s vstack first"
t1 = time.time()
second2 = secondmatrices.pop()
for i in range(len(secondmatrices)):
second2 = numpy.vstack((secondmatrices.pop(),second2))
print time.time() - t1, "s vstack second"
发布于 2011-05-17 20:59:01
你有80个数组320x320?因此,您可能希望使用dstack
first3 = numpy.dstack(firstmatrices)
这将返回一个80x320x320数组,就像numpy.array(firstmatrices)
一样:
timeit numpy.dstack(firstmatrices)
10 loops, best of 3: 47.1 ms per loop
timeit numpy.array(firstmatrices)
1 loops, best of 3: 750 ms per loop
如果您想使用vstack
,它将返回一个25600x320数组:
timeit numpy.vstack(firstmatrices)
100 loops, best of 3: 18.2 ms per loop
https://stackoverflow.com/questions/6030906
复制相似问题