人体姿态估计--Learning Feature Pyramids for Human Pose Estimation

用户1148525

发布于 2019-05-26 11:43:58

4400

发布于 2019-05-26 11:43:58

Learning Feature Pyramids for Human Pose Estimation ICCV2017 Torch： https://github.com/bearpaw/PyraNet

本文主要关注人体部件中的尺度问题， scale variations of human body parts，这种尺度变化主要发生在 camera view changes or severe foreshortening happens

提出的解决思路是 learning feature pyramids，设计了 Pyramid Residual Module (PRMs) 来增强 CNN 网络对尺度信息的提取能力。同时我们发现在 initialize the weights of multi-branch networks 存在问题以及 the problem of activation variance accumulation introduced by identity mapping may be harmful in some scenarios，对于这两个问题我们分别提出解决的方法。

3 Framework

3.1. Revisiting Stacked Hourglass Network

在每一个 stack hourglass” network 的末端我们加入了 intermediate supervision

3.2. Pyramid Residual Modules (PRMs) 这里我们设计了四个 PRMs 结构

我们的 PRM 可以作为 CNN 网络的一个基础模块，用于人体姿态估计或图像分类

4.1. Initialization Multi-Branch Networks

Existing weight initialization methods [33, 21, 24] are designed upon the assumption of a plain networks without branches 我们使用了 Multi-Branch Networks，所以现有的初始化策略不太适用，这里经过理论上的推导，我们得出的结论是：

the number of input branches and output branches should be taken into consideration when initializing parameters.

4.2. Output Variance Accumulation identity mappings 的引入让我们的网络层数可以增加很多，但是同时也引入了问题： identity mapping keeps increasing the variances of responses when the network goes deeper, which increases the difficulty of optimization

怎么解决这个问题了？使用 BN-ReLU-Conv block 替换 identity mappings