Caffe源码调试

这篇文件主要介绍如何使用Linux的gdb调试Caffe的源码,源码调试主要是为了阅读并更好的了解Caffe源码。

1. 准备工作

  1. 首先要在编译Caffe源码时打开debug模式,即将Makefile.config中的DEBUG := 1打开。
  2. 下载mnist数据集,主要是在mnist数据集上进行调试,执行bash data/mnist/get_mnist.sh
  3. 转换mnist数据集为LMDB,bash examples/mnist/create_mnist.sh
  4. 修改examples/mnist/lenet_solver.prototxt,将GPU改为CPU。

2. 调试

1. 激活GDB

使用GDB启动调试,执行gdb --args build/tools/caffe train --solver examples/mnist/lenet_solver.prototxt--args表示我们调试时需要输入的参数,调试的命令为build/tools/caffe,caffe命令的参数为--solver examples/mnist/lenet_solver.prototxt

执行结果:

$ gdb --args build/tools/caffe train --solver examples/mnist/lenet_solver.prototxt
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/irteam/line-brain/deploy/caffe/.build_debug/tools/caffe.bin...done.

2. 设置断点

执行b src/caffe/layers/base_conv_layer.cpp:117b表示插入断点(breakpoint),断点的位置是base_conv_layer.cpp文件中的117行。插入断点的命令形式为:

b path/to/code.cpp:#line

118行相关代码:

117 channels_ = bottom[0]->shape(channel_axis_);
118 num_output_ = this->layer_param_.convolution_param().num_output();
119 CHECK_GT(num_output_, 0);

执行结果:

(gdb) b src/caffe/layers/base_conv_layer.cpp:117
No source file named src/caffe/layers/base_conv_layer.cpp.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 1 (src/caffe/layers/base_conv_layer.cpp:117) pending.

3. 运行程序

运行程序的命令是r

执行结果:

Starting program: /*/caffe/build/tools/caffe train --solver examples/mnist/lenet_solver.prototxt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
I0718 15:19:19.671941 29986 caffe.cpp:211] Use CPU.
[New Thread 0x7fffd81c7700 (LWP 29991)]
[New Thread 0x7fffd79c6700 (LWP 29992)]
I0718 15:19:20.437239 29986 solver.cpp:44] Initializing solver from parameters:
test_iter: 100
test_interval: 500
base_lr: 0.01
display: 100
max_iter: 10000
lr_policy: "inv"
gamma: 0.0001
power: 0.75
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
solver_mode: CPU
net: "examples/mnist/lenet_train_test.prototxt"
train_state {
  level: 0
  stage: ""
}
I0718 15:19:20.437687 29986 solver.cpp:87] Creating training net from net file: examples/mnist/lenet_train_test.prototxt
I0718 15:19:20.438357 29986 net.cpp:294] The NetState phase (0) differed from the phase (1) specified by a rule in layer mnist
I0718 15:19:20.438398 29986 net.cpp:294] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0718 15:19:20.438499 29986 net.cpp:51] Initializing net from parameters:
name: "LeNet"
state {
  phase: TRAIN
  level: 0
  stage: ""
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}
I0718 15:19:20.439380 29986 layer_factory.hpp:77] Creating layer mnist
I0718 15:19:20.439625 29986 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I0718 15:19:20.439702 29986 net.cpp:84] Creating Layer mnist
I0718 15:19:20.439735 29986 net.cpp:380] mnist -> data
I0718 15:19:20.439853 29986 net.cpp:380] mnist -> label
I0718 15:19:20.444980 29986 data_layer.cpp:45] output data size: 64,1,28,28
I0718 15:19:20.445436 29986 base_data_layer.cpp:72] Initializing prefetch
[New Thread 0x7fffd603d700 (LWP 29993)]
I0718 15:19:20.448151 29986 base_data_layer.cpp:75] Prefetch initialized.
I0718 15:19:20.448186 29986 net.cpp:122] Setting up mnist
I0718 15:19:20.448216 29986 net.cpp:129] Top shape: 64 1 28 28 (50176)
I0718 15:19:20.448235 29986 net.cpp:129] Top shape: 64 (64)
I0718 15:19:20.448245 29986 net.cpp:137] Memory required for data: 200960
I0718 15:19:20.448264 29986 layer_factory.hpp:77] Creating layer conv1
I0718 15:19:20.448324 29986 net.cpp:84] Creating Layer conv1
I0718 15:19:20.448345 29986 net.cpp:406] conv1 <- data
I0718 15:19:20.448393 29986 net.cpp:380] conv1 -> conv1

Breakpoint 1, caffe::BaseConvolutionLayer<float>::LayerSetUp (this=0x91edd70,
    bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
    at src/caffe/layers/base_conv_layer.cpp:117
117       channels_ = bottom[0]->shape(channel_axis_);
Missing separate debuginfos, use: debuginfo-install OpenEXR-libs-1.7.1-7.el7.x86_64 atk-2.14.0-1.el7.x86_64 atlas-3.10.1-10.el7.x86_64 boost-filesystem-1.53.0-26.el7.x86_64 boost-python-1.53.0-26.el7.x86_64 boost-system-1.53.0-26.el7.x86_64 boost-thread-1.53.0-26.el7.x86_64 cairo-1.14.2-1.el7.x86_64 expat-2.1.0-10.el7_3.x86_64 fontconfig-2.10.95-10.el7.x86_64 freetype-2.4.11-12.el7.x86_64 gdk-pixbuf2-2.31.6-3.el7.x86_64 gflags-2.1.1-6.el7.x86_64 glib2-2.46.2-4.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 glog-0.3.3-8.el7.x86_64 graphite2-1.3.6-1.el7_2.x86_64 gstreamer-0.10.36-7.el7.x86_64 gstreamer-plugins-base-0.10.36-10.el7.x86_64 gtk2-2.24.28-8.el7.x86_64 harfbuzz-0.9.36-1.el7.x86_64 hdf5-1.8.12-8.el7.x86_64 ilmbase-1.0.3-7.el7.x86_64 jasper-libs-1.900.1-29.el7.x86_64 jbigkit-libs-2.0-11.el7.x86_64 leveldb-1.12.0-11.el7.x86_64 libX11-1.6.3-3.el7.x86_64 libXau-1.0.8-2.1.el7.x86_64 libXcomposite-0.4.4-4.1.el7.x86_64 libXcursor-1.1.14-2.1.el7.x86_64 libXdamage-1.1.4-4.1.el7.x86_64 libXext-1.3.3-3.el7.x86_64 libXfixes-5.0.1-2.1.el7.x86_64 libXi-1.7.4-2.el7.x86_64 libXinerama-1.1.3-2.1.el7.x86_64 libXrandr-1.4.2-2.el7.x86_64 libXrender-0.9.8-2.1.el7.x86_64 libffi-3.0.13-18.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libgfortran-4.8.5-11.el7.x86_64 libjpeg-turbo-1.2.90-5.el7.x86_64 libpng-1.5.13-7.el7_2.x86_64 libquadmath-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 libtiff-4.0.3-27.el7_3.x86_64 libv4l-0.9.5-4.el7.x86_64 libxcb-1.11-4.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 lmdb-libs-0.9.18-1.el7.x86_64 opencv-2.4.5-3.el7.x86_64 opencv-core-2.4.5-3.el7.x86_64 orc-0.4.22-5.el7.x86_64 pango-1.36.8-2.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 pixman-0.34.0-1.el7.x86_64 protobuf-2.5.0-8.el7.x86_64 python-libs-2.7.5-48.el7.x86_64 snappy-1.1.0-3.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64

Breakpoint 1之前是正常的程序日志输出,程序在断点处暂停。

查看变量命令为p var,命令与结果如下:

(gdb) p channels_
$1 = 0

(gdb) p channel_axis_
$2 = 1

此时,channels_值为0。下一行命令为n,执行结果如下:

(gdb) n
118       num_output_ = this->layer_param_.convolution_param().num_output();

此时查看channels_值为1,mnist数据是灰度图像,channels_1没问题:

(gdb) p channels_
$3 = 1

命令c是继续执行直到下一个断点。

如果需要调试GPU程序,可以使用cuda-gdb,文档地址为:http://docs.nvidia.com/cuda/cuda-gdb/index.html#axzz4nAAR7ujZ

参考资料

  1. http://zhaok.xyz/blog/post/debug-caffe/

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏林德熙的博客

win10 uwp 使用油墨输入 保存,修改,加载inkUWP 手写清理笔画手写识别无法识别手写语音

现在很多人还是使用笔和纸来记录,那么可以在电脑输入方式和之前使用的方式一样,很多用户觉得会方便。在win10 我们有一个简单的方法去让用户输入,InkCanva...

921
来自专栏码字搬砖

kylin报错 java.lang.ArrayIndexOutOfBoundsException: -1

当用kylin做报表时,kylin报错 2017-06-26 14:51:52,523 ERROR [IPC Server handler 3 on 330...

1024
来自专栏软件测试经验与教训

Python学习笔记(17)- os\\os.path 操作文件

3186
来自专栏Material Design组件

Human Interface Guidelines —— 搜索栏(Search Bars)

3008
来自专栏腾讯Bugly的专栏

Android 减包 - 减少APK大小

本文来自“天天P图攻城狮”公众号(ttpic_dev) 本文是对Google官方文档 Reduce APK Size 的翻译,查看英文原文(需要翻墙) 译者简介...

3696
来自专栏思考的代码世界

Python网络数据采集之处理自然语言|第07天

984
来自专栏编程

用在线RaxML构建系统发育树

本文将以在线的RAxML为例进行讲解: 测试数据及结果和相关处理软件已经上传至百度网盘:http://pan.baidu.com/s/1i5cPyXB密码:b2...

2247
来自专栏JackieZheng

可视化(番外篇)——在Eclipse RCP中玩转OpenGL

  最近在看有关Eclipse RCP方面的东西,鉴于Gephi是使用opengl作为绘图引擎,所以,萌生了在Eclipse RCP下添加画布,使用opengl...

1815
来自专栏BioIT-TECH

癌症中克隆种群结构统计推断分析软件PyClone安装小记

PyClone 是一种用于推断癌症中克隆种群结构的统计模型。 它是一种贝叶斯聚类方法,用于将深度测序的体细胞突变集分组到假定的克隆簇中,同时估计其细胞流行率(p...

1782
来自专栏天天P图攻城狮

Android减包 - 减少APK大小

本文是对Google官方文档 Reduce APK Size 的翻译 用户经常会避免下载看起来体积较大的应用,特别是在不稳定的2G、3G网络或者在以字节付费的网...

26910

扫码关注云+社区