基于计算机视觉的无人驾驶感知系统 文章中提到
在今年6月于美国拉斯维加斯召开的CVRP大会上,多伦多大学的Raquel Urtasun教授和她的学生改进了深度学习中的Siamese网络,用一个内积层代替了拼接层,把处理一对图片的时间从一分钟左右降低到一秒以内。
图7 Siamese结构的深度神经网络
如图7所示,这个Siamese结构的深度神经网络分左右两部分,各为一个多层的卷积神经网络(CNN),两个CNN共享网络权重。Optical Flow的偏移矢量估计问题转化为一个分类问题,输入是两个9x9的图片块,输出是128或者256个可能的偏移矢量y。通过从已知偏移矢量的图片对中抽取的图片块输入到左右两个CNN,然后最小化交叉熵(cross-entropy):
我们可以用监督学习的方法训练整个神经网络。
在KITTI的Stereo2012数据集上,这样一个算法可以在0.34秒完成计算,并达到相当出色的精度,偏移估计误差在3-4像素左右,对大于3像素的偏移估计误差在8.61像素,都好于其他速度慢很多的算法。
在得到每个像素y_i上的分布后,我们还需要加入空间上的平滑约束,这篇文章试验了三种方法:
这些平滑方法一起,能把偏移估计的误差再降低大约50%,这样一个比较准确的2维偏移矢量场就得到了。基于它,我们就能够得到如图8所示场景3维深度/距离估计。这样的信息对无人驾驶非常重要。
图8 深度信息图
代码:https://bitbucket.org/saakuraa/cvpr16_stereo_public/overview 阅读原文访问。
th inference_match_subimg.lua -g 0 --model split_win37_dep9 --data_version kitti2015 --data_root pretrain/kitti2015/sample_img --model_param pretrain/kitti2015/param.t7 --bn_meanstd pretrain/kitti2015/bn_meanstd.t7 --saveDir outImg --start_id 1 --n 1
2. Results of unary images will be save in outImgkitti2015_gene_loc_1(160,40,18,100,'debug_15',123)
to generate three binary files(~300MB total), corresponding to pixel locations you want to train and validate on.
Parameters: 160 is number of images to train on, 40 is number of image to validate on, 18 represents size of image patch with (2x18+1) by (2x18+1), 100 represents searching range(disparity range to train on, corresponding to 2x100+1), 'debug_15' is the folder to save results, 123 is the random seed.th train_match.lua -g 0 --tr_num 160 --val_num 40 --data_version kitti2015 -s logs/debug --model dot_win37_dep9 --psz 18 --util_root preprocess/debug_15 --data_root /ais/gobi3/datasets/kitti/scene_flow/training
remember to change util_root(the one specified in preprocess step) and data_root to proper directory. Notice this is trainng on only 160 images and will do validation on the remaining images.
use th train_match.lua -h
for more detailed explanation, and use corresponding parameters to train longer for better performance.th inference_match_subimg.lua -g 0 --model split_win37_dep9 --data_version kitti2015 --data_root /ais/gobi3/datasets/kitti/scene_flow/training --perm_fn preprocess/debug_15/myPerm.bin --model_param logs/debug_15/param_epoch_10.t7 --bn_meanstd logs/debug_15/bn_meanvar_epoch_10.t7 --saveDir outImg --start_id 161 --n 1
remember to change data_root as proper directory for images, perm_fn for permutation on file list(generated automatically from preprcess script), model_param for parameters, bn_meanstd for batch normalization statistics and start_id for image id for validation(since we are training on 160 images, we validate on images from 161th). It should take less than a second for one image.
use th inference_match_subimg.lua -h
for more detailed explanation.This code is licensed under GPL-3.0. If you use our code in your research, please cite our paper as:
@inproceedings{luo16a,
title = {Efficient Deep Learning for Stereo Matching},
author = {Luo, W. and Schwing, A. and Urtasun, R.},
booktitle = {International Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2016},
}