文章/答案/技术大牛

发布

社区首页 >问答首页 >如果模型性能不佳，Logistic回归是无法收敛的。

问如果模型性能不佳，Logistic回归是无法收敛的。
EN

Data Science用户

提问于 2020-07-16 14:55:49

回答 2查看 15.3K关注 0票数 5

我有一个多分类逻辑回归模型。使用一个非常基本的滑雪板管道，我在清理文本描述一个对象，并将所述对象分类为一个类别。

logreg = Pipeline([('vect', CountVectorizer()),
                ('tfidf', TfidfTransformer()),
                ('clf', LogisticRegression(n_jobs=1, C=cVal)),
               ])

一开始，我的调整强度为C= 1e5，在我的测试集上获得了78%的准确率，在我的训练集中达到了近100%的准确性(不确定这是否常见)。然而，尽管模型达到了合理的精度，但我还是被警告说，模型不能收敛，我应该增加最大迭代次数或缩放数据。

ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

改变max_iter没有任何作用，但是修改C允许模型收敛，但导致精度不高。以下是测试不同C值的结果：

--------------------------------------------------------------------------------
C = 0.1
Model trained with accuracy 0.266403785488959 in 0.99mins
maxCoeff 7.64751682657047
aveProb 0.1409874146376454
[0.118305   0.08591412 0.09528015 ... 0.19066049 0.09083797 0.0999868 ]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
C = 1
Model trained with accuracy 0.6291798107255521 in 1.72mins
maxCoeff 16.413911220284994
aveProb 0.4221365866656076
[0.46077294 0.80758323 0.12618175 ... 0.91545935 0.79839096 0.13214606]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 10
Model trained with accuracy 0.7720820189274448 in 1.9mins
maxCoeff 22.719712528228182
aveProb 0.7013386216302577
[0.92306384 0.97842762 0.71936027 ... 0.98604736 0.98845931 0.20129053]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 100
Model trained with accuracy 0.7847003154574133 in 1.89mins
maxCoeff 40.572468674674916
aveProb 0.8278969567537955
[0.98949986 0.99777337 0.94394682 ... 0.99882797 0.99992239 0.28833321]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 1000
Model trained with accuracy 0.7796529968454259 in 1.85mins
maxCoeff 72.19441171771533
aveProb 0.8845385182334065
[0.99817968 0.99980068 0.98481744 ... 0.9999964  0.99999998 0.36462353]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 10000
Model trained with accuracy 0.7757097791798108 in 1.88mins
maxCoeff 121.56900229473293
aveProb 0.9351308553465546
[0.99994777 0.99999677 0.98521023 ... 0.99999987 1.         0.48251051]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 100000
Model trained with accuracy 0.7785488958990536 in 1.84mins
maxCoeff 160.02719692775156
aveProb 0.9520556562102963
[0.99999773 0.99999977 0.98558839 ... 0.99999983 1.         0.54044361]
--------------------------------------------------------------------------------

因此，正如你所看到的，模型训练只收敛在1e-3到1之间的C值，但没有达到不收敛的较高C值所看到的精度。

更新:以下是C=1和C= 1e5的学习曲线。正如我在前面提到的，训练曲线似乎总是1或近1 (0.9999999)，具有较高的C值，没有收敛性，但是在C=1时，优化收敛的情况看起来更正常。我觉得很奇怪..。

C= 1，收敛

C= 1e5，不收敛

这是测试不同的求解器的结果。

--------------------------------------------------------------------------------
Solver = newton-cg
Model trained with accuracy 0.7810725552050474 in 6.23mins
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
Solver = lbfgs
Model trained with accuracy 0.7847003154574133 in 1.93mins
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Solver = liblinear
Model trained with accuracy 0.7779179810725552 in 0.27mins
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  "the coef_ did not converge", ConvergenceWarning)
Solver = sag
Model trained with accuracy 0.7818611987381704 in 0.47mins
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  "the coef_ did not converge", ConvergenceWarning)
Solver = saga
Model trained with accuracy 0.782018927444795 in 0.54mins
--------------------------------------------------------------------------------

这是常见的行为吗？基于这种行为，有人能看出我是否走错了路吗？

multiclass-classification

convergence

scikit-learn

logistic-regression

回答 2

Data Science用户

回答已采纳

发布于 2020-07-16 16:09:11

我经常看到LogisticRegression“不收敛”，但非常稳定(这意味着系数在迭代之间变化不大)。

也许有一些多元因素导致系数发生了很大的变化，而实际上并没有影响到很多预测/分数。

另一种可能性(似乎是这样的，谢谢您的测试)是，您正在接近完美的分离，在培训集。在非惩罚logistic回归中，线性可分数据集不会有最佳拟合:系数会膨胀到无穷大(将概率推到0和1)。当你加正则化的时候，它会防止那些巨大的系数。因此，对于C的大值，即小正则化，仍然可以得到较大的系数，因此收敛性可能很慢，但是部分收敛的模型在测试集上可能仍然很好；而在大正则化的情况下，系数要小得多，在训练和测试集上的性能都很差。

如果您担心不收敛，可以尝试增加n_iter (更多)、增加tol、更改solver或缩放特性(尽管使用tf，我认为这不会有帮助)。

我会寻找给您带来好结果的最大的C，然后继续尝试使它与更多的迭代和/或不同的求解器一起收敛。

票数 5

Data Science用户

发布于 2020-07-22 13:35:35

由于@BenReiniger的建议，我将逆正则化强度从C= 1e5降至C= 1e2。这使得模型能够收敛，在测试集中(基于C值)的精度最大化，只需将max_iter从100次-> 350次迭代增加一次。

下面的学习曲线仍然显示出很高的训练精度(并不完全是1)，但是我的研究似乎表明这在高维logistic回归应用中并不少见，例如基于文本的分类(我的用例)。

当你拥有高维数据集时，在训练过程中获得完美的分类是很常见的。这类数据集在基于文本的分类、生物信息学等中经常会遇到。

票数 1

页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://datascience.stackexchange.com/questions/77813

复制

相似问题

问如果模型性能不佳，Logistic回归是无法收敛的。
EN

回答 2

Data Science用户

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如果模型性能不佳，Logistic回归是无法收敛的。EN

回答 2

Data Science用户

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如果模型性能不佳，Logistic回归是无法收敛的。
EN