NumPy/SciPy中如何使用多线程整数矩阵乘法？内容来源于 Stack Overflow，并遵循CC BY-SA 3.0许可协议进行翻译与使用

• 回答 (1)
• 关注 (0)
• 查看 (118)

```import numpy as np
a = np.random.rand(10**4, 10**4)
b = np.dot(a, a)```

`a = np.random.randint(2, size=(n, n)).astype(np.int8)`

```array: np.random.randint(2, size=shape).astype(dtype)

dtype    shape          %time (average)

float32 (2000, 2000)    62.5 ms
float32 (3000, 3000)    219 ms
float32 (4000, 4000)    328 ms
float32 (10000, 10000)  4.09 s

int8    (2000, 2000)    13 seconds
int8    (3000, 3000)    3min 26s
int8    (4000, 4000)    12min 20s
int8    (10000, 10000)  It didn't finish in 6 hours

float16 (2000, 2000)    2min 25s
float16 (3000, 3000)    Not tested
float16 (4000, 4000)    Not tested
float16 (10000, 10000)  Not tested```

```import scipy.linalg.blas as blas
a = np.random.randint(2, size=(n, n)).astype(np.int8)
b = blas.sgemm(alpha=1.0, a=a, b=a)```

1. 使用NumPy的痛苦缓慢`np.dot`很高兴我保留了8位整数。
2. 使用SciPy的`sgemm`消耗掉4倍的内存。
3. 使用Numpy的`np.float16`只消耗了2x内存，但请注意`np.dot`在浮动16数组上要比在浮动32数组上慢得多，比int 8慢得多。
4. 找到一个优化的多线程整数矩阵乘法库(实际上，数学这样做，但我更喜欢Python解决方案)，理想情况下支持1位数组，尽管8位数组也很好。

1 个回答

```import numpy as np
from numpy.testing import assert_array_equal
from time import time

def blockshaped(arr, nrows, ncols):
"""
Return an array of shape (nrows, ncols, n, m) where
n * nrows, m * ncols = arr.shape.
This should be a view of the original array.
"""
h, w = arr.shape
n, m = h // nrows, w // ncols
return arr.reshape(nrows, n, ncols, m).swapaxes(1, 2)

def do_dot(a, b, out):
#np.dot(a, b, out)  # does not work. maybe because out is not C-contiguous?
out[:] = np.dot(a, b)  # less efficient because the output is stored in a temporary array?

def pardot(a, b, nblocks, mblocks, dot_func=do_dot):
"""
Return the matrix product a * b.
The product is split into nblocks * mblocks partitions that are performed
"""
n_jobs = nblocks * mblocks
print('running {} jobs in parallel'.format(n_jobs))

out = np.empty((a.shape[0], b.shape[1]), dtype=a.dtype)

out_blocks = blockshaped(out, nblocks, mblocks)
a_blocks = blockshaped(a, nblocks, 1)
b_blocks = blockshaped(b, 1, mblocks)

for i in range(nblocks):
for j in range(mblocks):
args=(a_blocks[i, 0, :, :],
b_blocks[0, j, :, :],
out_blocks[i, j, :, :]))
th.start()

th.join()

return out

if __name__ == '__main__':
a = np.ones((4, 3), dtype=int)
b = np.arange(18, dtype=int).reshape(3, 6)
assert_array_equal(pardot(a, b, 2, 2), np.dot(a, b))

a = np.random.randn(1500, 1500).astype(int)

start = time()
pardot(a, a, 2, 4)
time_par = time() - start
print('pardot: {:.2f} seconds taken'.format(time_par))

start = time()
np.dot(a, a)
time_dot = time() - start
print('np.dot: {:.2f} seconds taken'.format(time_dot))```

```running 8 jobs in parallel
pardot: 5.45 seconds taken
np.dot: 22.30 seconds taken```