稀疏矩阵压缩sparse.csr_matrix函数与sparse.csc_matric详解

学到老

发布于 2018-04-17 15:04:18

1.9K0

发布于 2018-04-17 15:04:18

概述

在用python进行科学运算时，常常需要把一个稀疏的np.array压缩，这时候就用到scipy库中的sparse.csr_matrix(csr:Compressed Sparse Row marix) 和sparse.csc_matric(csc:Compressed Sparse Column marix) 官网直通车：直通车

csr_matrix

>>> indptr = np.array([0, 2, 3, 6])#0表示默认起始点，0之后有几个数字就表示有几行
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

注：矩阵下标为0.

其中：indptr参数，0表示默认起始点，0之后有几个数字就表示有几行 data 表示元数据显然为1， 2， 3， 4， 5， 6 shape 表示矩阵的形状为 3 * 3 indices 表示各个数据在各行的下标，从该数据我们可以知道：数据1在某行的0位置处，数据2在某行的2位置处，6在某行的2位置处。而各个数据在哪一行就要通过indptr参数得到的 indptr 表示每行数据的个数：[0 2 3 6]表示从第0行开始数据的个数，0表示默认起始点，0之后有几个数字就表示有几行，第一个数字2表示第一行有2 - 0 = 2个数字，因而数字1，2都第0行，第二行有3 - 2 = 1个数字，因而数字3在第1行，以此类推，我们能够知道所有数字的行号 Example: 数字6 ，indptr推出在第2行，indices推出在第2列。

csc_matrix

上面的csr_matrix是通俗易懂的解释方法，下面我们以csc_matrix为例来看看比较官方的解释：

#  示例解读
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csc_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 4],
       [0, 0, 5],
       [2, 3, 6]])
# 按col列来压缩
# 对于第i列，非0数据行是indices[indptr[i]:indptr[i+1]] 数据是data[indptr[i]:indptr[i+1]]
# 在本例中，共有三列
# 第0列，有非0元素的数据行（0列索引下的行）个数：indices[indptr[i]:indptr[i+1]]=indices[indptr[0]:indptr[1]]=indices[0:2] =2，这两个非0元素所在的行分别是indices[0],indices[2],对应的元素是data[indptr[0]:indptr[1]]=data[0:2]= [1,2]，所以在第0列第0行是1，第2行是2
# 第1行，有非0的数据行（1列索引下的行）个数是：
indices[indptr[i]:indptr[i+1]]=indices[indptr[1]:indptr[2]] = indices[2:3]= 1
这1个非0元素所在的行分别是indices[2]
数据是data[indptr[1]:indptr[2]] = data[2:3] = [3],所以在第1列第2行是3
# 第2行，有非0的数据行是indices[indptr[2]:indptr[3]] = indices[3:6] = [0,1,2]
# 数据是data[indptr[2]:indptr[3]] = data[3:6] = [4,5,6],所以在第2列第0行是4，第1行是5,第2行是6

coo_matrix

这个就更容易了，给我一分钟。直接上例子如下：即n行，m列存了data[i]，其余位置皆为0.

>>> from scipy.sparse import coo_matrix
>>> coo_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

>>> row  = np.array([0, 3, 1, 0])
>>> col  = np.array([0, 3, 1, 2])
>>> data = np.array([4, 5, 7, 9])
>>> coo_matrix((data, (row, col)), shape=(4, 4)).toarray()
array([[4, 0, 9, 0],
       [0, 7, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 5]])

是不是很绕，如果你有更好的来来来评论出来1!!!!

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2018年04月02日，如有侵权请联系 cloudcommunity@tencent.com 删除

python

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

python

登录后参与评论

0 条评论

热度