Detect to Track and Track to Detect ICCV2017 https://github.com/feichtenhofer/detect-track
本文针对视频目标检测问题提出一个统一的框架同时完成检测和跟踪 In this paper we propose a unified approach to tackle the problem of object detection in realistic video
ImageNet video object detection challenge (VID) 这个竞赛目前影响力是比较大
视频目标检测难度比较大,主要有以下几个原因: (i) size 视频的数据量比较大 VID has around 1.3M images, compared to around 400K in DET or 100K in COCO (ii)motion blur: 因为相机或物体运动导致的图像运动模糊 due to rapid camera or object motion (iii) quality 网络视频的质量是参差不齐的 (iv) partial occlusion 有时遮挡是比较严重的 (v) pose: unconventional object-to-camera poses are frequently seen in video 姿态的多样性
3 D&T Approach Detect and Track (D&T) 3.1. D&T overview We aim at jointly detecting and tracking (D&T) objects in video 我们是基于 R-FCN 检测框架,extend it for multi-frame detection and tracking
总体网络结构如下所示:
最大的亮点是 提出了 一个 RoI Tracking,这个模块将两帧直接的物体关联起来,完成物体跟踪 We compute correlation maps for all positions in a feature map and let RoI pooling operate on these feature maps for track regression
Performance comparison on the ImageNet VID validation set