人脸检测--S3FD: Single Shot Scale-invariant Face Detector

用户1148525

发布于 2019-05-26 12:01:12

9720

发布于 2019-05-26 12:01:12

S3FD: Single Shot Scale-invariant Face Detector ICCV2017 Caffe code will be available

本文针对基于 anchor 的检测器对小的人脸检测率低的问题进行了分析和改进。

基于 anchor 的目标检测发展迅速，人脸检测也有很大进展，但是对于小的人脸检测效果仍然不是很好。 the performance of anchor-based detectors drop dramatically as the objects becoming smaller

Biased framework. The anchor-based detection frameworks tend to miss small and medium faces 原因如下： 1） the lowest anchor-associated layer 使用的步长尺寸 stride size 太大， (SSD 是 8个像素，Faster R-CNN 是16个像素)，在这些网络层中中小尺寸的人脸被高度压缩，只有少量的特征用于检测。 2）小的人脸， anchor scale，和 receptive field 是相互不匹配的，mutual mismatch

3 Single shot scale-invariant face detector 本文的检测系统流程图如下：

3.1. Scale-equitable framework 这里我们将 anchor-associated 网络层的步长从 4到 128 以2倍方式递增，这样我们可以保证不同尺度的人脸都有足够的信息用于人脸检测。 our architecture ensures that different scales of faces have adequate features for detection at corresponding anchor associated layers.

我们的 anchors 尺寸从 16 to 512， based on the effective receptive field and our equal-proportion interval principle

Constructing architecture，我们的网络结构包括以下几个部分： 1） Base Convolutional Layers: 保持 VGG16 中的 conv1 1 to pool5，去除其他网络层 2） Extra Convolutional Layers：通过增加这些额外的卷积层得到 multi-scale feature maps 3） Detection Convolutional Layers，我们选择前面网络的 conv3_3, conv4_3, conv5_3, conv fc7, conv6_2 and conv7_2 作为检测层，使用不同尺寸的 anchor 来进行预测 4） Normalization Layers: conv3_3, conv4_3, conv5_3 这三个网络层具有不同特征尺度，我们采用 L2 normalization [27] 其归一化 5） Predicted Convolutional Layers: 每个检测层后面使用一个 p×3×3×q 卷积层用于预测，p and q are the channel number of input and output, and 3×3 is the kernel size。对于每个 anchor，我们预测 4个坐标位置补偿， N_s 个分类概率，其中 conv3_3 检测层是N_s = N_m + 1 , (N m is the max-out background label),其他检测层 N_s =2

6） Multi-task Loss Layer: 我们对于分类使用 softmax loss，对于位置回归使用 smooth L1 loss

Designing scales for anchors：对于6个检测层，我们使用不同尺寸的 square anchor

这里有个 Effective receptive field 概念，文献【29】指出，一个 CNN 单元有 2个类别的 receptive fields： theoretical receptive field 和 effective receptive field 两者关系如下：

这里我们采用了一个设计原则 Equal-proportion interval principle: the scales of our anchors are 4 times its interval anchors 的尺寸等于 4 乘以步长这么做可以保证不同尺寸的 anchor 在图像中具有相同的检测密度 guarantees that different scales of anchor have the same density on the image, so that various scales face can approximately match the same number of anchors

3.2. Scale compensation anchor matching strategy 图像中连续分布的人脸尺寸和 anchor 离散的尺寸造成两个相邻的 anchor 中间尺寸的人脸检测率低 anchor scales are discrete while face scales are continuous ， these faces whose scales distribute away from anchor scales can not match enough anchors, leading to their low recall rate

这里我们采用下面两个步骤来解决该问题： 1） Stage one: decrease threshold from 0.5 to 0.35 in order to increase the average number of matched anchors 2） Stage Two: 我们挑出一个人脸对应的 jaccard overlap 大于 0.1的 anchors ，然后排序，对该人脸挑出前 N 个作为 matched anchors firstly picking out anchors whose jaccard overlap with this face are higher than 0.1, then sorting them to select top-N as matched anchors of this face. We set N as the average number from stage one.

3.3. Max-out background label 对于 conv3_3 由于采用了小的 anchor，导致 high false positive rate of small faces，太多的人脸虚警

we propose to apply a more sophisticated classification strategy on the lowest layer to handle the complicated background from small anchors. 这里背景太过复杂，将其分类一类太笼统了，于是我们将背景细分为多类，人脸作为一类。这样复杂的背景可以被正确分类的概率就提升了。

为了去除这些虚警，我们对每个最小 anchors，我们将背景分为了 N_m 类，在计数每个位置类别时，我们得到 N_m 类个背景的概率，然后选择一个概率最大的分数作为最终分数用于计算 softmax loss。 Max-out operation integrates some local optimal solutions to reduce the false positive rate of small faces

F： only uses the scale-equitable framework F+S： applies the scale-equitable framework and the scale compensation anchor matching strategy F+S+M ： scale-equitable framework, the scale compensation anchor matching strategy and the max-out background label

从上面可以看出： Scale-equitable framework is crucial

We measure the speed using Titan X (Pascal) and cuDNN v5.1 with Intel Xeon E5-2683v3@2.00GHz. For the VGA-resolution image with batch size 1 using a single GPU, our face detector can run at 36 FPS and achieve the real-time speed. Besides, about 80% of the forward time is spent on the VGG16 network, hence using a faster base network could further improve the speed

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2017年08月22日，如有侵权请联系 cloudcommunity@tencent.com 删除

人脸识别

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

人脸识别

登录后参与评论

0 条评论

热度

人脸检测--S3FD: Single Shot Scale-invariant Face Detector

人脸检测--S3FD: Single Shot Scale-invariant Face Detector

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐