首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >在训练模型时,出现了不兼容的行维值错误。

在训练模型时,出现了不兼容的行维值错误。
EN

Stack Overflow用户
提问于 2020-05-16 09:57:44
回答 1查看 732关注 0票数 3

我在上实现了一个数据集决策树。在此之前,我想用CountVectorizer.转换一个特定的列。为此,我使用管道使其更简单。

但是存在一个不兼容的行维的错误。

代码

代码语言:javascript
运行
复制
# Imported the libraries....
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc


transformer=ct(transformers=[('review_counts',cv(),['verified_reviews']),
                             ('variation_dummies', ohe(),['variation'])
                            ],remainder='passthrough')

pipe= mp(transformer,dtc(random_state=42))

x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback

x_train,x_test,y_train,y_test= tts(x,y,test_size=0.3,random_state=42,stratify=y)
print(x_train.shape,y_train.shape)             # ((2205, 3), (2205,))

pipe.fit(x_train,y_train)                       # Error on this line

错误

代码语言:javascript
运行
复制
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-79-a981c354b190> in <module>()
----> 1 pipe.fit(x_train,y_train)

7 frames
/usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
    584                                                     exp=brow_lengths[i],
    585                                                     got=A.shape[0]))
--> 586                     raise ValueError(msg)
    587 
    588                 if bcol_lengths[j] == 0:

ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,1].shape[0] == 2205, expected 1.

问题

  1. 不兼容行维的错误是如何形成的?
  2. 怎么才能解决?
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-16 15:02:50

尝试将所需列以列表形式传递给ohe,而将简单字符串传递给cv。

代码语言:javascript
运行
复制
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc

data = pd.DataFrame({'rating':np.random.randint(0,10,6),'variation':['a','b','c','a','b','c'],
                   'verified_reviews':['adnf asd','sdf dsa','das j s','asd jd s','sad jds a','sajd'],
                   'feedback':np.random.randint(0,2,6)})

transformer=ct(transformers=[('review_counts',cv(),'verified_reviews'),
                             ('variation_dummies', ohe(),['variation'])],
               remainder='passthrough')

pipe= mp(transformer, dtc(random_state=42))

x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback

pipe.fit(x,y)

根据文档,每当转换器需要一个一维数组作为输入时,列就被指定为字符串("xxx")。对于需要2D数据的转换器,我们需要将列指定为字符串列表("xxx")。

票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61834976

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档