CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting International Conference on Advanced Video and Signal Based Surveillance (AVSS) 2017 Torch: https://github.com/svishwa/crowdcount-cascaded-mtl
本文主要解决人群密度估计问题中的 人群场景变化大的问题，人在场景中的尺度和外观变化范围大 the issue of large variations in scale and appearance of the objects that occurs due to severe perspective distortion of the scene
本文提出的解决思路是使用 CNN网络，并在网络中嵌入 high-level prior 先验知识 The aim of this work is to learn models that cater to a wide variety of density levels present in the data set by incorporating a high-level prior into the network.
所谓的 high-level prior 就是根据图像中的大致总人数将图像分类不同的若干类，本文将图像根据总人数分为10类 The high-level prior learns to classify the count into various groups whose class labels are based on the number of people present in the image.
这个 high-level prior 可以不受 scale variations 的影响 让我们能够对图像中总人数有一个大致的估计 By exploiting count labels, the high-level prior is able to estimate coarse count of people in the entire image irrespective of scale variations thereby enabling the network to learn more discriminative global features.
3 Proposed method
我们的CNN网络前两个卷积用于提取公用特征，接着网络一分为二，一个分支是用于 High-level prior stage，这个分支主要干什么了？Classifying the crowd into several groups， quantize the crowd count into ten groups and learn a crowd count group classifier which also performs the task of incorporating high-level prior into the network
第二个分支 接着使用四个卷积层提取特征，然后再综合 上个分支的特征，使用 fractionally strided convolutions 做特征图上采样，得到大尺度的密度估计图
目标损失函数： 1） cross-entropy loss function for the high-level prior stage
2） loss function for the density estimation stage
Ground truth density map 真值密度图的生成： calculated by summing a 2D Gaussian kernel centered at every person’s location x
4 Experimental results