我正在尝试优化一些执行大量顺序矩阵运算的代码。
我认为numpy.linalg.multi_dot
(docs here)将执行C或BLAS中的所有操作,因此它将比arr1.dot(arr2).dot(arr3)
等要快得多。
我真的很惊讶在笔记本上运行这段代码:
v1 = np.random.rand(2,2)
v2 = np.random.rand(2,2)
%%timeit
v1.dot(v2.dot(v1.dot(v2)))
The slowest run took 9.01 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.14 µs per loop
%%timeit
np.linalg.multi_dot([v1,v2,v1,v2])
The slowest run took 4.67 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 32.9 µs per loop
发现使用multi_dot
进行同样的操作大约要慢10倍。
我的问题是:
https://stackoverflow.com/questions/45852228
复制相似问题