我有一个非常大的稀疏矩阵( 100000列,100000行)。我想选择这个稀疏矩阵的一些行,然后用它们来形成一个新的稀疏矩阵。我尝试通过先将它们转换为稠密矩阵,然后再将它们转换为稀疏矩阵来实现。但是当我这样做的时候,python会抛出一个“内存错误”。然后我尝试了另一种方法,我选择稀疏矩阵的行,然后将它们放入一个数组中,但当我尝试将此数组转换为稀疏矩阵时,它会显示:“ValueError:包含多个元素的数组的真值不明确。请使用a.any()或a.all()。”那么我如何将这个列表稀疏矩阵转换成一个大的稀疏矩阵呢?
# X_train is a sparse matrix of size 100000x100000, it is in sparse form
# y_train is a 1 denmentional array with length 100000
# I try to get a new sparse matrix by using some rows of X_train, the
#selection criteria is sum of the sparse row = 0
#y_train_new = []
#X_train_new = []
for i in range(len(y_train)):
if np.sum(X_train[i].toarray()[0]) == 0:
X_train_new.append(X_train[i])
y_train_new.append(y_train[i])
当我这样做的时候:
X_train_new = scipy.sparse.csr_matrix(X_train_new)
我得到了错误消息:
'ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all().'
发布于 2019-06-05 08:40:50
我添加了一些标签,可以帮助我更快地看到你的问题。
在询问错误时,最好提供部分或全部回溯,这样我们就可以看到错误发生的位置。有关问题函数调用的输入的信息也会有所帮助。
幸运的是,我可以相当容易地重现这个问题--而且是在一个合理大小的示例中。不需要制作一个没有人能看到的100000 x10000矩阵!
制作一个中等大小的稀疏矩阵:
In [126]: M = sparse.random(10,10,.1,'csr')
In [127]: M
Out[127]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
我可以做一个完整的矩阵行加和,就像密集数组一样。稀疏代码实际上使用矩阵-向量乘法来实现这一点,从而产生密集的矩阵。
In [128]: M.sum(axis=1)
Out[128]:
matrix([[0.59659958],
[0.80390719],
[0.37251645],
[0. ],
[0.85766909],
[0.42267366],
[0.76794737],
[0. ],
[0.83131054],
[0.46254634]])
它足够稀疏,以至于一些行没有零。对于浮点数,尤其是0-1范围内的浮点数,我不会得到非零值抵消的行。
或者使用逐行计算:
In [133]: alist = [np.sum(row.toarray()[0]) for row in M]
In [134]: alist
Out[134]:
[0.5965995802776853,
0.8039071870427961,
0.37251644566924424,
0.0,
0.8576690924353791,
0.42267365715276595,
0.7679473651419432,
0.0,
0.8313105376003095,
0.4625463360625408]
并选择总和为零的行(在本例中为空行):
In [135]: alist = [row for row in M if np.sum(row.toarray()[0])==0]
In [136]: alist
Out[136]:
[<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>]
请注意,这是一个稀疏矩阵列表。这也是你得到的,对吧?
现在,如果我试着用它做矩阵,我会得到你的错误:
In [137]: sparse.csr_matrix(alist)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-137-5e20e6fc2524> in <module>
----> 1 sparse.csr_matrix(alist)
/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py in __init__(self, arg1, shape, dtype, copy)
86 "".format(self.format))
87 from .coo import coo_matrix
---> 88 self._set_self(self.__class__(coo_matrix(arg1, dtype=dtype)))
89
90 # Read matrix dimensions given, if any
/usr/local/lib/python3.6/dist-packages/scipy/sparse/coo.py in __init__(self, arg1, shape, dtype, copy)
189 (shape, self._shape))
190
--> 191 self.row, self.col = M.nonzero()
192 self.data = M[self.row, self.col]
193 self.has_canonical_format = True
/usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py in __bool__(self)
285 return self.nnz != 0
286 else:
--> 287 raise ValueError("The truth value of an array with more than one "
288 "element is ambiguous. Use a.any() or a.all().")
289 __nonzero__ = __bool__
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
好吧,这个错误并没有告诉我很多(至少在没有更多阅读代码的情况下),但它显然与输入列表有问题。但是再读一遍csr_matrix
文档吧!是不是说我们可以给它一个稀疏矩阵的列表?
但是有一个sparse.vstack
函数可以处理一个矩阵列表(在np.vstack
上建模):
In [140]: sparse.vstack(alist)
Out[140]:
<2x10 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>
如果我们选择和不为零的行,我们会得到更有趣的结果:
In [141]: alist = [row for row in M if np.sum(row.toarray()[0])!=0]
In [142]: M1=sparse.vstack(alist)
In [143]: M1
Out[143]:
<8x10 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
但我之前展示过,我们可以在不迭代的情况下获得行和。对Out[128]
应用where
,我得到了(非零行的)行索引:
In [151]: idx=np.where(M.sum(axis=1))
In [152]: idx
Out[152]: (array([0, 1, 2, 4, 5, 6, 8, 9]), array([0, 0, 0, 0, 0, 0, 0, 0]))
In [153]: M2=M[idx[0],:]
In [154]: M2
Out[154]:
<8x10 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
In [155]: np.allclose(M1.A, M2.A)
Out[155]: True
====
我怀疑生成In[137]
是为了查找输入的nonzero
(np.where
)元素,或者输入转换为numpy数组:
In [159]: alist = [row for row in M if np.sum(row.toarray()[0])==0]
In [160]: np.array(alist)
Out[160]:
array([<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>], dtype=object)
In [161]: np.array(alist).nonzero()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-161-832a25987c15> in <module>
----> 1 np.array(alist).nonzero()
/usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py in __bool__(self)
285 return self.nnz != 0
286 else:
--> 287 raise ValueError("The truth value of an array with more than one "
288 "element is ambiguous. Use a.any() or a.all().")
289 __nonzero__ = __bool__
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
稀疏矩阵列表上的np.array
生成这些矩阵的对象数据类型数组。
https://stackoverflow.com/questions/56319794
复制相似问题