前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Efficient Deep Learning for Stereo Matching:代码

Efficient Deep Learning for Stereo Matching:代码

作者头像
用户1908973
发布2018-07-24 17:19:18
6290
发布2018-07-24 17:19:18
举报
文章被收录于专栏:CreateAMindCreateAMind

基于计算机视觉的无人驾驶感知系统 文章中提到

在今年6月于美国拉斯维加斯召开的CVRP大会上,多伦多大学的Raquel Urtasun教授和她的学生改进了深度学习中的Siamese网络,用一个内积层代替了拼接层,把处理一对图片的时间从一分钟左右降低到一秒以内。

图7 Siamese结构的深度神经网络

如图7所示,这个Siamese结构的深度神经网络分左右两部分,各为一个多层的卷积神经网络(CNN),两个CNN共享网络权重。Optical Flow的偏移矢量估计问题转化为一个分类问题,输入是两个9x9的图片块,输出是128或者256个可能的偏移矢量y。通过从已知偏移矢量的图片对中抽取的图片块输入到左右两个CNN,然后最小化交叉熵(cross-entropy):

我们可以用监督学习的方法训练整个神经网络。

  • i是像素的指标。
  • y_i是像素i可能的偏移矢量。
  • p_gt是一个平滑过的目标分布,用来给一两个像素的预估误差反馈一个非0的概率,gt表示ground truth。
  • p_i (y_i,w)是神经网络输出的给定w时y_i的概率。

在KITTI的Stereo2012数据集上,这样一个算法可以在0.34秒完成计算,并达到相当出色的精度,偏移估计误差在3-4像素左右,对大于3像素的偏移估计误差在8.61像素,都好于其他速度慢很多的算法。

在得到每个像素y_i上的分布后,我们还需要加入空间上的平滑约束,这篇文章试验了三种方法:

  • 最简单直接的5x5窗口平均。
  • 加入了相邻像素一致性的半全局块匹配(Semi Global Block Matching,SGBM)。
  • 超像素+3维斜面。

这些平滑方法一起,能把偏移估计的误差再降低大约50%,这样一个比较准确的2维偏移矢量场就得到了。基于它,我们就能够得到如图8所示场景3维深度/距离估计。这样的信息对无人驾驶非常重要。

图8 深度信息图

代码:https://bitbucket.org/saakuraa/cvpr16_stereo_public/overview 阅读原文访问。

Efficient Deep Learning for Stereo Matching

Using pretrained model

  1. We include pretrained model of KITTI2015 and KITTI2012 in pretrain. To use pretrained model of KITTI2015(similar for KITTI2012), run: th inference_match_subimg.lua -g 0 --model split_win37_dep9 --data_version kitti2015 --data_root pretrain/kitti2015/sample_img --model_param pretrain/kitti2015/param.t7 --bn_meanstd pretrain/kitti2015/bn_meanstd.t7 --saveDir outImg --start_id 1 --n 12. Results of unary images will be save in outImg

Training

Prepare data

  1. Modify variable data_root in preprocess/kitti2015_gene_loc_1.m with the path to corresponding training folder.
  2. Go to preprocess folder and use matlab to run: kitti2015_gene_loc_1(160,40,18,100,'debug_15',123) to generate three binary files(~300MB total), corresponding to pixel locations you want to train and validate on. Parameters: 160 is number of images to train on, 40 is number of image to validate on, 18 represents size of image patch with (2x18+1) by (2x18+1), 100 represents searching range(disparity range to train on, corresponding to 2x100+1), 'debug_15' is the folder to save results, 123 is the random seed.

Running training script

  1. Install/update torch
  2. A running script example: th train_match.lua -g 0 --tr_num 160 --val_num 40 --data_version kitti2015 -s logs/debug --model dot_win37_dep9 --psz 18 --util_root preprocess/debug_15 --data_root /ais/gobi3/datasets/kitti/scene_flow/training remember to change util_root(the one specified in preprocess step) and data_root to proper directory. Notice this is trainng on only 160 images and will do validation on the remaining images. use th train_match.lua -h for more detailed explanation, and use corresponding parameters to train longer for better performance.
  3. A training will be saved in logs/debug by default as well as model parameter and batch normalization statistics from training.

Testing

  1. Sample running script is: th inference_match_subimg.lua -g 0 --model split_win37_dep9 --data_version kitti2015 --data_root /ais/gobi3/datasets/kitti/scene_flow/training --perm_fn preprocess/debug_15/myPerm.bin --model_param logs/debug_15/param_epoch_10.t7 --bn_meanstd logs/debug_15/bn_meanvar_epoch_10.t7 --saveDir outImg --start_id 161 --n 1 remember to change data_root as proper directory for images, perm_fn for permutation on file list(generated automatically from preprcess script), model_param for parameters, bn_meanstd for batch normalization statistics and start_id for image id for validation(since we are training on 160 images, we validate on images from 161th). It should take less than a second for one image. use th inference_match_subimg.lua -h for more detailed explanation.
  2. Results will be saved in outImg by default. You should see image from unary for both left and right image.
  3. No postprocess by default. Additional libs required.

Misc

  1. Apply same steps for running on KITTI 2012 stereo dataset.
  2. For postprocessing, you need to setup corresponding code from MC-CNN or SPS. By default, the inference code will output unary images for both left and right image.

License

This code is licensed under GPL-3.0. If you use our code in your research, please cite our paper as:

代码语言:javascript
复制
@inproceedings{luo16a,
    title = {Efficient Deep Learning for Stereo Matching},
    author = {Luo, W. and Schwing, A. and Urtasun, R.},
    booktitle = {International Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2016},
}
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2017-03-28,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Efficient Deep Learning for Stereo Matching
    • Using pretrained model
      • Training
        • Prepare data
        • Running training script
      • Testing
        • Misc
          • License
          领券
          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档