人群密度估计--CrowdNet: A Deep Convolutional Network for Dense Crowd Counting

用户1148525

发布于 2018-01-03 15:55:46

1.7K0

CrowdNet: A Deep Convolutional Network for Dense Crowd Counting published in the proceedings of ACM Conference on Multimedia (ACMMM) - 2016 http://val.serc.iisc.ernet.in/CrowdNet/ Caffe: https://github.com/davideverona/deep-crowd-counting_crowdnet

针对人群密度估计问题，本文使用 deep and shallow, fully convolutional networks 两个网络相结合实现 large scale variations， high-level semantic information (face/body detectors) and the low-level features (blob detectors)

我们的网络结构如下所示：

Deep Network 主要用捕获 high-level semantics 信息，这里我们采用一个类似 VGG网络的结构，我们去掉了全连接层，网络变成了全卷积层。同时原来的 VGG网络使用了5个 max-pool layers each with a stride of 2，最终的特征图大小只有输入图像尺寸的1/32。我们这里需要输出像素级别的人群密度估计图，所以我们 set the stride of the fourth max-pool layer to 1 and remove the fifth pooling layer，这样最终的特征图大小只有输入图像尺寸的 1/8.

the receptive-field mismatch caused by the removal of stride in the fourth max-pool layer 将第四最大池化层的步长设置为1会导致 the receptive-field mismatch，这里我们使用了文献【4】中的膨胀卷积。其结果就相当第四最大池化层的步长设置为2

Shallow Network 这里我们使用一个 shallow convolutional network 主要用于检测远离相机的人头， used for the detection of small head-blobs

Combination of Deep and Shallow Networks 这里 concatenate Deep and Shallow Networks 的输出，输入图像尺寸的 1/8，使用一个 1x1 convolution layer，再 upsampled to the size of the input image using bilinear interpolation to obtain the final crowd density prediction

3.2 Ground Truth generate our ground truth by simply blurring each head annotation using a Gaussian kernel normalized to sum to one

3.3 Data Augmentation 这里主要使用两类数据增强 primarily perform two types of augmentation 1）对 scale variations 我们多尺度采样