前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Caffe训练时Loss不下降问题

Caffe训练时Loss不下降问题

作者头像
Tyan
发布2019-05-25 23:33:33
1K0
发布2019-05-25 23:33:33
举报
文章被收录于专栏:SnailTyanSnailTyan

版权声明:博客文章都是作者辛苦整理的,转载请注明出处,谢谢! https://cloud.tencent.com/developer/article/1434113

文章作者:Tyan

博客:noahsnail.com | CSDN | 简书

1. 问题描述

今天使用Caffe进行分类模型训练时,迭代到一定次数后loss突然增大到某个固定值,然后保持不变。日志如下:

代码语言:javascript
复制
I0705 14:57:14.980687   320 solver.cpp:218] Iteration 44 (2.60643 iter/s, 0.383667s/1 iters), loss = 0.263664
I0705 14:57:14.980741   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 0.878881 (* 0.3 = 0.263664 loss)
I0705 14:57:14.980756   320 sgd_solver.cpp:105] Iteration 44, lr = 0.000956
I0705 14:57:15.365164   320 solver.cpp:218] Iteration 45 (2.60146 iter/s, 0.3844s/1 iters), loss = 20.7475
I0705 14:57:15.365226   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 69.1584 (* 0.3 = 20.7475 loss)
I0705 14:57:15.365243   320 sgd_solver.cpp:105] Iteration 45, lr = 0.000955
I0705 14:57:15.759548   320 solver.cpp:218] Iteration 46 (2.53612 iter/s, 0.394303s/1 iters), loss = 0
I0705 14:57:15.759609   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 0 (* 0.3 = 0 loss)
I0705 14:57:15.759624   320 sgd_solver.cpp:105] Iteration 46, lr = 0.000954
I0705 14:57:16.158644   320 solver.cpp:218] Iteration 47 (2.50621 iter/s, 0.39901s/1 iters), loss = 1.63756
I0705 14:57:16.158696   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 5.45853 (* 0.3 = 1.63756 loss)
I0705 14:57:16.158715   320 sgd_solver.cpp:105] Iteration 47, lr = 0.000953
I0705 14:57:16.546782   320 solver.cpp:218] Iteration 48 (2.57693 iter/s, 0.388058s/1 iters), loss = 3.27512
I0705 14:57:16.546838   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 10.9171 (* 0.3 = 3.27512 loss)
I0705 14:57:16.546855   320 sgd_solver.cpp:105] Iteration 48, lr = 0.000952
I0705 14:57:16.930493   320 solver.cpp:218] Iteration 49 (2.60667 iter/s, 0.383631s/1 iters), loss = 25.3822
I0705 14:57:16.930553   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 84.6073 (* 0.3 = 25.3822 loss)
I0705 14:57:16.930568   320 sgd_solver.cpp:105] Iteration 49, lr = 0.000951
I0705 14:57:17.314102   320 solver.cpp:218] Iteration 50 (2.60741 iter/s, 0.383522s/1 iters), loss = 26.201
I0705 14:57:17.314185   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 87.3365 (* 0.3 = 26.201 loss)
I0705 14:57:17.314213   320 sgd_solver.cpp:105] Iteration 50, lr = 0.00095
I0705 14:57:17.695567   320 solver.cpp:218] Iteration 51 (2.62216 iter/s, 0.381364s/1 iters), loss = 26.201
I0705 14:57:17.695627   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 87.3365 (* 0.3 = 26.201 loss)
I0705 14:57:17.695642   320 sgd_solver.cpp:105] Iteration 51, lr = 0.000949
I0705 14:57:18.077605   320 solver.cpp:218] Iteration 52 (2.61813 iter/s, 0.381953s/1 iters), loss = 26.201
I0705 14:57:18.077667   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 87.3365 (* 0.3 = 26.201 loss)
I0705 14:57:18.077684   320 sgd_solver.cpp:105] Iteration 52, lr = 0.000948
I0705 14:57:18.461403   320 solver.cpp:218] Iteration 53 (2.60613 iter/s, 0.383711s/1 iters), loss = 26.201
I0705 14:57:18.461458   320 solver.cpp:237]     Train net output #0: loss1/loss1 = 87.3365 (* 0.3 = 26.201 loss)

2. 解决方案

调整参数,例如学习率之类的都没用。调查发现finetune时冻结了BN层的参数(即batch_norm_param中的use_global_stats设置为true),将其use_global_stats设置为false,问题解决。

参考资料

  1. https://blog.csdn.net/u010911921/article/details/71079367
  2. https://github.com/BVLC/caffe/issues/3347
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018年07月05日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1. 问题描述
  • 2. 解决方案
  • 参考资料
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档