Faster RCNN中的RPN

YoungTimes

发布于 2022-04-28 13:08:26

3780

发布于 2022-04-28 13:08:26

文章被收录于专栏：半杯茶的小酒杯

Faster R-CNN最突出的贡献在于提出Region Proposal Network(RPN)替换了选择性搜索(Selective Search)，在保证对象检测的准确率的条件下，将检测时间降低了10倍左右，实现了实时的对象检测。

从上表中可以看出，SS平均耗费约1.5s来计算Proposal，而Faster R-CNN的Proposal+Detection过程耗费198ms，通过共享卷积特征层，RPN获取Proposal的过程仅仅耗时10ms。

RPN网络

RPN网络生成Region Proposal的过程如下:

we slide a small network over the conv feature map output by the last shared conv layer

Region Proposal Network

RPN网络结构：3x3的卷积层 + 两个1x1的卷积网络(reg + cls)。 conv layer，slim.conv2d的激活函数是ReLU：

rpn = slim.conv2d(net_conv,
         cfg.RPN_CHANNELS,
         [3, 3],     
         trainable=is_training, 
         weights_initializer=initializer,
         scope="rpn_conv/3x3")

cls layer:

rpn_cls_score = slim.conv2d(rpn,
               self._num_anchors * 2,
               [1, 1], 
               trainable=is_training,
               weights_initializer=initializer,
               padding='VALID', 
               activation_fn=None,
              scope='rpn_cls_score')

reg layer:

rpn_bbox_pred = slim.conv2d(rpn,
                 self._num_anchors * 4,
                 [1, 1], 
                 trainable=is_training,
                 weights_initializer=initializer,
                 padding='VALID', 
                 activation_fn=None,
                 scope='rpn_bbox_pred')

由于采用滑动窗口的模式，所以整个Feature Map共享相同的参数。

RPN生成ROI

RPN的Anchor机制相当于对暴力穷举，论文中，设置stride=16，那么1000x16的图片就生成大约60x40x9=20000个Anchor Boxes，如此多的Anchor Boxes，需要经过一定筛选机制，剔除重复和无效的Anchor Boxes，以得到更好的Region Proposal。

    if is_training:
      rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
      rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")
      with tf.control_dependencies([rpn_labels]):
        rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")
    else:
      if cfg.TEST.MODE == 'nms':
        rois, _ = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
      elif cfg.TEST.MODE == 'top':
        rois, _ = self._proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
      else:
        raise NotImplementedError

_proposal_layer：

通过神经网络产生的rpn_bbox_pred是参数化的Bounding Box，首先需要转换到图像坐标框，并且裁剪掉超出边界的部分。

parameterization of the bounding box coordinate

然后采用nms(非最大值抑制)，按照Score大小返回前Top N的rois；

非最大值抑制(NMS)

非最大值抑制的过程如下,将所有的Proposal按照score排序，然后去除重叠区域大于阈值的候选框。

def py_cpu_nms(dets, thresh):  
    """Pure Python NMS baseline."""  
    x1 = dets[:, 0]  
    y1 = dets[:, 1]  
    x2 = dets[:, 2]  
    y2 = dets[:, 3]  
    scores = dets[:, 4]  

    areas = (x2 - x1 + 1) * (y2 - y1 + 1) # 计算出所有图片的面积  
    order = scores.argsort()[::-1] # 图片评分按升序排序  

    keep = [] # 用来存放最后保留的图片的相应评分  
    while order.size > 0:   
        i = order[0] # i 是还未处理的图片中的最大评分  
        keep.append(i) # 保留改图片的值  
        # 矩阵操作，下面计算的是图片i分别与其余图片相交的矩形的坐标  
        xx1 = np.maximum(x1[i], x1[order[1:]])   
        yy1 = np.maximum(y1[i], y1[order[1:]])  
        xx2 = np.minimum(x2[i], x2[order[1:]])  
        yy2 = np.minimum(y2[i], y2[order[1:]])  

        # 计算出各个相交矩形的面积  
        w = np.maximum(0.0, xx2 - xx1 + 1)  
        h = np.maximum(0.0, yy2 - yy1 + 1)  
        inter = w * h  
        # 计算重叠比例  
        ovr = inter / (areas[i] + areas[order[1:]] - inter)  

        #只保留比例小于阙值的图片，然后继续处理  
        inds = np.where(ovr <= thresh)[0]  
        order = order[inds + 1]  

    return keep

_anchor_target_layer：

该函数过滤在图片范围之外的Anchor，通过IoU计算正负样本，最终返回rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights,和rpn_bbox_outside_weights.

_proposal_target_layer:

训练过程中每个Batch输入一张图片，在一张图片中包含需要许多正样本和负样本的Anchors，一般情况下，不会把所有的样本都参与训练，因为通常都是负样本占据多数，会误导整个网络。论文中，会随机的在一张图片中采样256个Anchor(正样本和负样本数量为1:1)，如果正样本的数量少于128个，会用负样本来补充。该函数会所有的ROI、用于回归的Box以及用于Loss计算的bbox_inside_weights和bbox_outside_weights。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-07-20，如有侵权请联系 cloudcommunity@tencent.com 删除

机器学习