前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >里程碑式成果Faster RCNN复现难?我们试了一下 | 附完整代码

里程碑式成果Faster RCNN复现难?我们试了一下 | 附完整代码

作者头像
AI科技大本营
发布2019-08-23 15:45:13
2K0
发布2019-08-23 15:45:13
举报
文章被收录于专栏:AI科技大本营的专栏

作者 | 已退逼乎

【导读】2019年以来,除各AI 大厂私有网络范围外,MaskRCNN,CascadeRCNN 成为了支撑很多业务得以开展的基础,而以 Faster RCNN 为基础去复现其他的检测网络既省时又省力,也算得上是里程碑性成果了。因此本文主要以简洁易懂的文字复现了 Resnet - Faster R-CNN 。

Faster RCNN 复现

文章之前,我们先来明确检测类任务都在干些什么:

需求:

对图像中的特定种类目标做出分类,并求出目标在图像中所处的位置即最终需要的信息:

  • object-classes_name
  • object-position

在一般的检测任务中类别信息通常由索引代替,例如1-> apple,2 - > cat,3 - > dog,...... > 而位置一般可以由两组坐标代替:> 矩形的左上角,右下角坐标(x1,y1,x2,y2)

Faster R-CNN 作为两阶段检测网络发展中最重要的一个网络,基本可以视为检测任务的里程碑性成果。

延伸扩展的 MaskRCNN,CascadeRCNN 都成为了 2019 年这个时间点上除了各家 AI 大厂私有网络范围外,支撑很多业务得以开展的基础。所以,Pytorch 为基础来从头复现 FasterRCNN 网络是非常有必要的,其中包含了太多的招数和理论中不会包括的先验知识。

甚至,以 Faster RCNN 为基础去复现其他的检测网络 所需要的精力和时间都会大大降低

我们的目标:尝试用最简洁,最贴合原文得写法复现 Resnet - Faster R-CNN

注:> > 本文中的代码为结构性示例的代码片段,不能够复制粘贴直接运行

完整目录会在文章完成后更新

架构

原始更快 RCNN 以 VGG16 为基础骨干网络。

但是 VGG16-19 因为参数的急剧膨胀和深层结构搭建导致参数量暴涨,网络在反向传播过程中要不断地传播梯度,而当网络层数加深时,梯度在逐层传播过程中会逐渐衰减,导致无法对前面网络层的权重进行有效的调整。

因此 vgg19 就出现了它的局限性。

而在之后提出的残差网络中,加入了短连接为梯度带来了一个直接向前面层的传播通道,缓解了梯度的减小问题,同时,将整个网络的深度加到了 100 层 + ,甚至后来的 DenseNet 出现了实用的 200 层 + 网络。并且大量使用了 1 * 1 卷积来降低参数量因此本文将尝试 ResNet 101 + 更快的 RCNN ,以及衔接 DenseNet 和 Faster-RCNN 的可能性。

从以上图中我们可以看出 Faster R-CNN 除了作为特征提取部分的主干网络,剩下的最关键的也就是以下部分

  • RPN
  • RPN LossFunction
  • ROI Pooling
  • Faster-R-CNN Loss Function

也就是说我们的复现工作要着重从这些部分开始。现在看到的最优秀的复现版本应该是 Jianwei Yang page

本文的代码较多的综合了多种写法,以及 pytorch 标准结构的写法

0.整体流程

先来看看代码(只是大概的看一下就行):

代码语言:javascript
复制
#################################非Resnet版本只看看基本结构就可以
class FasterRCNN(nn.Module):
    n_classes = 21
    classes = np.asarray(['__background__',
                       'aeroplane', 'bicycle', 'bird', 'boat',
                       'bottle', 'bus', 'car', 'cat', 'chair',
                       'cow', 'diningtable', 'dog', 'horse',
                       'motorbike', 'person', 'pottedplant',
                       'sheep', 'sofa', 'train', 'tvmonitor'])
    PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
    SCALES = (600,)
    MAX_SIZE = 1000

    def __init__(self, classes=None, debug=False):
        super(FasterRCNN, self).__init__()

        if classes is not None:
            self.classes = classes
            self.n_classes = len(classes)

        self.rpn = RPN()
        self.roi_pool = RoIPool(7, 7, 1.0/16)
        self.fc6 = FC(512 * 7 * 7, 4096)
        self.fc7 = FC(4096, 4096)
        self.score_fc = FC(4096, self.n_classes, relu=False)
        self.bbox_fc = FC(4096, self.n_classes * 4, relu=False)

        # loss
        self.cross_entropy = None
        self.loss_box = None



    @property
    def loss(self):
        return self.cross_entropy + self.loss_box * 10

    def forward(self, im_data, im_info, gt_boxes=None, gt_ishard=None, dontcare_areas=None):
        features, rois = self.rpn(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)

        if self.training:
            roi_data = self.proposal_target_layer(rois, gt_boxes, gt_ishard, dontcare_areas, self.n_classes)
            rois = roi_data[0]

        # roi pool
        pooled_features = self.roi_pool(features, rois)
        x = pooled_features.view(pooled_features.size()[0], -1)
        x = self.fc6(x)
        x = F.dropout(x, training=self.training)
        x = self.fc7(x)
        x = F.dropout(x, training=self.training)

        cls_score = self.score_fc(x)
        cls_prob = F.softmax(cls_score)
        bbox_pred = self.bbox_fc(x)

        if self.training:
            self.cross_entropy, self.loss_box = self.build_loss(cls_score, bbox_pred, roi_data)

        return cls_prob, bbox_pred, rois


这段代码并不是完整定义,只显示了主要流程,辅助性,功能性的方法全部被省略。结构也被大大简化 当我们以数据为线索则会产生以下的流程

Faster RCNN 以数据为节点的处理流程

我们在主干网络中可以清晰地看到,向前按照什么样的顺序执行了整个流程( just take a look )

值得注意的是,在以上执行流程中,有些过程需要相应的辅助函数来进行 比如 loss 的构建,框生成等等,都需要完备的函数库来辅助进行。

以上流程图 ,以及本文的叙述顺序与线索,都是以数据为依托的,明确各个部分数据之间的计算 输入输出信息是非常重要的

初始训练数据包含了:

1.DataLoader

数据加载部分十分自然地,要输入我们的数据集,我们这篇文章使用按最常用的

代码语言:javascript
复制
coco2014/17标准

pascal VOC标准

自定义数据集标准

全部部分当然不能展现 但是我们会在开源项目中演示Voclike数据集,以及自定义数据集如何方便的被加载->开源-快速训练工具(未完成)(https://github.com/OOXXXXOO/WSNet)

在本文中为了关注主旨我们只介绍自定义数据集和 VocLike 数据集的加载过程

Data2Dataset

数据的原始形式,当然是以图片为主 我们以一张图为例.

使用labelme标注之后

保存之后软件就会自动生成例如:

代码语言:javascript
复制
{
    "version": "3.4.1",
    "flags": {},
    "shapes": [
        {
            "label": "dog",
            "line_color": null,
            "fill_color": null,
            "points": [
                [
                    7,
                    144
                ],
                [
                    307,
                    588
                ]
            ],
            "shape_type": "rectangle"
        },
        ......
        {
            "label": "dog",
            "line_color": null,
            "fill_color": null,
            "points": [
                [
                    756,
                    130
                ],
                [
                    974,
                    507
                ]
            ],
            "shape_type": "rectangle"
        }
    ],
    "lineColor": [
        0,
        255,
        0,
        128
    ],
    "fillColor": [
        255,
        0,
        0,
        128
    ],
    "imagePath": "timg.jpeg",
    "imageData": "此处为base64编码过得图像数据"
}

还有 labelimg xml 标准的数据样本

代码语言:javascript
复制
<annotation>
    <folder>图片</folder>
    <filename>timg.jpeg</filename>
    <path>/home/winshare/图片/timg.jpeg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>1000</width>
        <height>612</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>dog</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>9</xmin>
            <ymin>163</ymin>
            <xmax>309</xmax>
            <ymax>584</ymax>
        </bndbox>
    </object>
....
    <object>
        <name>dog</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>748</xmin>
            <ymin>142</ymin>
            <xmax>977</xmax>
            <ymax>508</ymax>
        </bndbox>
    </object>
</annotation>

以及 yolo 标准的 bbox

代码语言:javascript
复制
class_id box
0 0.159000 0.610294 0.300000 0.687908
0 0.346000 0.433824 0.216000 0.638889
0 0.491500 0.449346 0.191000 0.588235
0 0.650000 0.511438 0.246000 0.614379
0 0.863000 0.535948 0.230000 0.588235

yolo 的 box 值最终会由下面的方法转换为标准的框数据( xywh )

代码语言:javascript
复制
def convert(size, box):  # 归一化操作
    #size图像尺寸
    #box包围盒
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)

Dataset2Dataloader

很多对 Faster RCNN 复现的版本中,标准的数据加载流程没有被固定化,所以数据被以各种 datalayer ,roidb 等等方法包装,Pytorch0.4 之后,实质上已经出现了最标准化的数据输入形式

因此我们设 getdata() 为从一个数据列表的 json 对象中根据索引值返回我们需要的指定信息 imagebboxlist classlist scale

这其中 bboxlist 指的是一张图像中所有目标标注框构成的列表,classlist 指的是类名,并保证两个列表索引对齐

有了这些设计目标,我们就可以开始构建代码了

代码语言:javascript
复制
class Dataset:
    def __init__(self, opt):
        self.opt = opt
        self.db = DatasetClass(opt.data_dir)
        #定义原始数据的来源如果是自定义数据可以尝试设置数据的列表,然后利用一个通用的读取器来读取数据
        self.tsf = Transform(opt.min_size, opt.max_size)

    def __getitem__(self, idx):
        image, bboxlist, classlist= self.db.getdata(idx)
        img, bbox, label, scale = self.tsf((image, bboxlist, classlist))#对原始数据的变换操作
        # TODO: check whose stride is negative to fix this instead copy all
        # some of the strides of a given numpy array are negative.
        return img.copy(), bbox.copy(), label.copy(), scale

    def __len__(self):
        return len(self.db)

其中变换操作,是为了使得数据转换为张量,以及对数据的平移旋转等数据增强手段。实质上,Pytorch 提供了一系列的 Transform 下面的代码实际上有很多部分可以省略或者替代

代码语言:javascript
复制
class Transform(object):

    def __init__(self, min_size=600, max_size=1000):
        self.min_size = min_size
        self.max_size = max_size

    def __call__(self, in_data):
        img, bbox, label = in_data
        _, H, W = img.shape
        img = preprocess(img, self.min_size, self.max_size)
        _, o_H, o_W = img.shape
        scale = o_H / H
        bbox = util.resize_bbox(bbox, (H, W), (o_H, o_W))

        # horizontally flip
        img, params = util.random_flip(
            img, x_random=True, return_param=True)
        bbox = util.flip_bbox(
            bbox, (o_H, o_W), x_flip=params['x_flip'])

        return img, bbox, label, scale

实质上影像的处理依靠 torchvision.transforms 的写法更加符合一般性 pytorch 的标准 这里不在多的探讨. 经过 Dataset 处理和包装之后,其实通过获取方法得到的数据已经可以进入网络训练了,但是实质上还需要最后一层包装。

代码语言:javascript
复制
from torch.utils import data as data_
dataset=Dataset()
dataloader = data_.DataLoader(dataset, \
                                  batch_size=1, \
                                  shuffle=True, \
                                  # pin_memory=True,
                                  num_workers=num_workers)

在 pytorch 的体系中,数据加载的最终目的使用 Dataloader 处理 dataset 对象,以方便的控制 Batch,Shuffle 等等操作。

建议的简介原始数据被转换为 list 或者以序号为索引的字典,因为训练流程的大 IO 量 所以一些索引比较慢的格式会深刻的影响训练速度。

值得注意的一点:在以上的 DataLoader 中 Worker 是负责数据加载的多进程数量。torch.multiprocessing 是一个本地 multiprocessing 模块的包装.

它注册了自定义的reducers, 并使用共享内存为不同的进程在同一份数据上提供共享的视图. 一旦 tensor/storage 被移动到共享内存 , 将其发送到任何进程不会造成拷贝开销.

此 API 100% 兼容原生模块 - 所以足以将 import multiprocessing 改成 import torch.multiprocessing 使得所有的 tensors 通过队列发送或者使用其它共享机制, 移动到共享内存.

Python 3 支持进程之间共享 CUDA 张量,我们可以使用 spawn 或forkserver 启动此类方法。Python 2 中的 multiprocessing 多进程处理只能使用 fork 创建子进程,并且CUDA 运行时不支持多进程处理。

以下代码显示了 worker 的实际工作样貌

代码语言:javascript
复制
if self.num_workers > 0:
            # worker_init_fn是worker初始化函数
            self.worker_init_fn = loader.worker_init_fn
            # index_queue 索引队列 每个worker进程对应一个: 
            self.index_queues = [multiprocessing.Queue() for _ in range(self.num_workers)]
            # worker 队列索引
            self.worker_queue_idx = 0
            # worker_result_queue 进程间通信
            # multiprocessing.SimpleQueue是multiprocessing.Queue([maxsize])的简化,只有三个方法------empty(), get(), put()
            self.worker_result_queue = multiprocessing.SimpleQueue()
            # batches_outstanding
            # 当前已经准备好的 batch 的数量(可能有些正在准备中)
            # 当为 0 时, 说明, dataset 中已经没有剩余数据了。
            # 初始值为 0, 在 self._put_indices() 中 +1,在 self.__next__ 中-1
            self.batches_outstanding = 0
            self.worker_pids_set = False
            # shutdown为True是关闭worker
            self.shutdown = False
            # send_idx, rcvd_idx——发送索引,接收索引
            # send_idx 用来记录 这次要放 index_queue 中 batch 的 idx
            self.send_idx = 0
            # rcvd_idx 用来记录 这次要从 data_queue 中取出 的 batch 的 idx
            self.rcvd_idx = 0
            # 因为多进程,可能会导致 data_queue 中的batch乱序
            # 用这个来保证 batch 的返回是按照send_idx升序出去的。
            self.reorder_dict = {}

            # 创建num_workers个worker进程来处理
            self.workers = [
                multiprocessing.Process(
                    target=_worker_loop,
                    args=(self.dataset, self.index_queues[i],
                          self.worker_result_queue, self.collate_fn, base_seed + i,
                          self.worker_init_fn, i))
                for i in range(self.num_workers)]

            # 这里暂不分析CUDA或者timeout的情况
            if self.pin_memory or self.timeout > 0:
                ...
            else:
                # data_queue就是self.worker_result_queue(MultiProcessing.SimpleQueue()类型)
                # 这个唯一的队列
                self.data_queue = self.worker_result_queue
            # 设置守护进程
            for w in self.workers:
                w.daemon = True  # ensure that the worker exits on process exit
                w.start()

            ...

            # prime the prefetch loop
            # 初始化的时候,就将 2*num_workers 个 (batch_idx, sampler_indices) 放到 index_queue 中
            for _ in range(2 * self.num_workers):
                self._put_indices()

通过数层的封装,我们完成了对训练数据的高速加载,变换,BatchSIze,Shuffle 等训练流程所需的操作的构建。最终在训练流程中通过迭代器就可以获取数据输入网络流程。最终通过:

代码语言:javascript
复制

这时候的数据就可以直接输入网络了,我们也顺利的进入到了下一个阶段。

2.BackBone - Resnet/VGG

作为两阶段网络的骨干网络,其深度,和性能优劣都深刻的影响着整个网络的性能,其前向推断的速度和准确度都至关重要,VGG 作为最原始的骨干网络,各方面的表现都已经落后于新提出的网络。所以我们从 Resnet 的结构说起

原始 VGG 网络和 Resnet34 的对比

相比 VGG 的各种问题来说 Resnet 提出了新的残差块来对不必要的卷积流程进行跳过,于是网络的加深,高级特征的提取变得更加容易,在此之后,几乎所有的骨干网络更新都是从块结构的优化着手。例如 DenseNet 就对块结构做出了更多连接模式的探索

简单起见我们从最基础的 BackBone-Resnet 开始

1. BasicBlock

在代码阶段的表现就是 ResNet 网络的构建代码中包含了跳层结构

代码语言:javascript
复制
class BasicBlock(nn.Module):
    expansion = 1
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)
        return out
代码语言:javascript
复制
2.Bottleneck
代码语言:javascript
复制

class Bottleneck(nn.Module):
    expansion = 4
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = conv1x1(inplanes, planes)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = conv3x3(planes, planes, stride)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = conv1x1(planes, planes * self.expansion)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)
        out += identity
        out = self.relu(out)

        return out

我们在以上网络中看到 跳层的控制结构由 downsample() 控制,也就是说残差块会判断下采样是否为空,如果训练流程执行的不是下采样,那么就进行正常的卷积流程。但是如果训练流程决定执行下采样,就说明残差块中的卷积结果需要加上下采样生成的恒等(identity)。我们从原理上看一下它为什么有效,首先我们来定义残差单元:

其中 $h(x)$ 为 identity 一般求解过程中直接设为 x

$F(x_{l},W_{l})$ 为残差函数( w 是什么不用说了吧) $f(x)$ 为激活函数 ReLU 从此定义来看我们从 l 层学习到 L 层:

求反向传播梯度:

表示 loss 在 L 层的梯度,小括号里面的是残差梯度,其加法结构相比传统的乘法结构有一个直接的好处就是,可以发现当 Loss 在很小的时候也因为 1 的存在不会出现残差梯度的消失,既该层不会像传统网络一样,因为乘法结构导致梯度消失。具体的讨论可以在下面的文章中找到

Identity Mappings in Deep Residual Networks

(https://arxiv.org/pdf/1603.05027.pdf)

因为我们只需 BackBone 作为提取特征的工具, 最终将图片提取为一个合乎其他部分输入的 featuremap 就可以 我们来看一下,最终的骨干网络怎么构成:

代码语言:javascript
复制
class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)

        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x

由此生成的标准 Resnet 肯定是不能为我们直接使用的,因为 RPN 接口所需要的是 FeatureMap 不是最后全连接的结果,因此,我们需要 Layer3 的输出,而不是 fc 的输出。那么问题来了:

Resnet 怎么嫁接到 RPN 网络呢?

这就要从 RPN 需要输入的尺寸,和 Layer 输出的尺寸说起。我们根据图可以看到 Resnet101 layer3 的输出是 1*1,1024

适配工作 AnyFeature to RPN

我们以 Resnet101 为例 来展示一个典型的 Resnet 网络作为 Faster RCNN 的 BackBone 是怎么一种操作

在原版中,VGG 作为 BackBone 我们看到的写法是

代码语言:javascript
复制
self.features = VGG16(bn=False)
        self.conv1 = Conv2d(512, 512, 3, same_padding=True)
        .
        .
        .
   def forward(self, im_data, im_info, gt_boxes=None, gt_ishard=None, dontcare_areas=None):
        features = self.features(im_data)    
        rpn_conv1 = self.conv1(features)

也就是说只要我们把最终把 BackBone 产生的 feature 尺寸和 Conv1 输入的尺寸匹配好就可以了, 当我们定义一个 Resnet for RPN 的类的时候可以参考下面的流程

代码语言:javascript
复制
class resnet(_fasterRCNN):
  def __init__(self, classes, num_layers=101, pretrained=False, class_agnostic=False):
    self.model_path = 'data/pretrained_model/resnet101_caffe.pth'
    self.dout_base_model = 1024
    self.pretrained = pretrained
    self.class_agnostic = class_agnostic

    _fasterRCNN.__init__(self, classes, class_agnostic)

  def _init_modules(self):
    resnet = resnet101()

    if self.pretrained == True:
      print("Loading pretrained weights from %s" %(self.model_path))
      state_dict = torch.load(self.model_path)
      resnet.load_state_dict({k:v for k,v in state_dict.items() if k in resnet.state_dict()})

    # Build resnet.
    ##########################
    self.RCNN_base = nn.Sequential(resnet.conv1, resnet.bn1,resnet.relu,
      resnet.maxpool,resnet.layer1,resnet.layer2,resnet.layer3)

    self.RCNN_top = nn.Sequential(resnet.layer4)
    ##########################
    self.RCNN_cls_score = nn.Linear(2048, self.n_classes)
    if self.class_agnostic:
      self.RCNN_bbox_pred = nn.Linear(2048, 4)
    else:
      self.RCNN_bbox_pred = nn.Linear(2048, 4 * self.n_classes)

    # Fix blocks
    for p in self.RCNN_base[0].parameters(): p.requires_grad=False
    for p in self.RCNN_base[1].parameters(): p.requires_grad=False

    assert (0 <= cfg.RESNET.FIXED_BLOCKS < 4)
    if cfg.RESNET.FIXED_BLOCKS >= 3:
      for p in self.RCNN_base[6].parameters(): p.requires_grad=False
    if cfg.RESNET.FIXED_BLOCKS >= 2:
      for p in self.RCNN_base[5].parameters(): p.requires_grad=False
    if cfg.RESNET.FIXED_BLOCKS >= 1:
      for p in self.RCNN_base[4].parameters(): p.requires_grad=False

    def set_bn_fix(m):
      classname = m.__class__.__name__
      if classname.find('BatchNorm') != -1:
        for p in m.parameters(): p.requires_grad=False

    self.RCNN_base.apply(set_bn_fix)
    self.RCNN_top.apply(set_bn_fix)

可以看到常见的做法就是把这 BackBone 分成两部分,以 ResNet101 为例,这里把构造过程分成了两部分:

代码语言:javascript
复制


self.RCNN_base = nn.Sequential(resnet.conv1, resnet.bn1,resnet.relu,
      resnet.maxpool,resnet.layer1,resnet.layer2,resnet.layer3)

self.RCNN_top = nn.Sequential(resnet.layer4)

  def _head_to_tail(self, pool5):
    fc7 = self.RCNN_top(pool5).mean(3).mean(2)
    return fc7

base_feat = self.RCNN_base(6)

        # feed base feature map tp RPN to obtain rois
rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(base_feat, im_info, gt_boxes, num_boxes)

既从最开始到 Layer3 为一部分,layer4 为一部分,在之后的操作中RCNN_Base 作为通用的 feature 输入 RPN,而经过 ROI Pooling(Align)后的 feature 进入最后的 layer4

代码语言:javascript
复制

pooled_feat = self.RCNN_roi_pool(base_feat, rois.view(-1,5))
        # ..................最终拼接位置
        pooled_feat = self._head_to_tail(pooled_feat)# compute bbox offset

经过 layer4 之后池化的 feature 在进入类别预测和 box 预测 如下

代码语言:javascript
复制
class RPN(nn.Module):
    _feat_stride = [16, ]
    anchor_scales = [8, 16, 32]

    def __init__(self):
        super(RPN, self).__init__()

        self.features = VGG16(bn=False)
        self.conv1 = Conv2d(512, 512, 3, same_padding=True)
        self.score_conv = Conv2d(512, len(self.anchor_scales) * 3 * 2, 1, relu=False, same_padding=False)
        self.bbox_conv = Conv2d(512, len(self.anchor_scales) * 3 * 4, 1, relu=False, same_padding=False)

        # loss
        self.cross_entropy = None
        self.los_box = None

    @property
    def loss(self):
        return self.cross_entropy + self.loss_box * 10

    def forward(self, im_data, im_info, gt_boxes=None, gt_ishard=None, dontcare_areas=None):
            """
                          |-->rpn_cls_score_net--->_______--->Class Scores--->|softmax--->|Class Probabilities
                          |    w/16,h/16,9,2       reshape
                          |  
        rpn_net -->relu-->|
                          |  
                          |-->rpn_bbx_pred_net---->_______--->Bounding Box regressors---->|
                               w/16,h/16,9,4       reshape
        """
        im_data = network.np_to_variable(im_data, is_cuda=True)
        im_data = im_data.permute(0, 3, 1, 2)
        features = self.features(im_data)

        rpn_conv1 = self.conv1(features)

        # rpn cls score net
        rpn_cls_score = self.score_conv(rpn_conv1)
        rpn_cls_score_reshape = self.reshape_layer(rpn_cls_score, 2)
        rpn_cls_prob = F.softmax(rpn_cls_score_reshape)
        rpn_cls_prob_reshape = self.reshape_layer(rpn_cls_prob, len(self.anchor_scales)*3*2)

        # rpn bbx pred net
        rpn_bbox_pred = self.bbox_conv(rpn_conv1)

        # proposal layer
        cfg_key = 'TRAIN' if self.training else 'TEST'
        # 
        rois = self.proposal_layer(rpn_cls_prob_reshape, rpn_bbox_pred, im_info,
                                   cfg_key, self._feat_stride, self.anchor_scales)

        # generating training labels and build the rpn loss
        if self.training:
            assert gt_boxes is not None
            rpn_data = self.anchor_target_layer(rpn_cls_score, gt_boxes, gt_ishard, dontcare_areas,
                                                im_info, self._feat_stride, self.anchor_scales)
            self.cross_entropy, self.loss_box = self.build_loss(rpn_cls_score_reshape, rpn_bbox_pred, rpn_data)

        return features, rois

代码语言:javascript
复制
def proposal_layer(rpn_cls_prob_reshape, rpn_bbox_pred, im_info, cfg_key, _feat_stride=[16, ],
                   anchor_scales=[8, 16, 32]):
    """
    Parameters
    ----------
    rpn_cls_prob_reshape: (1 , H , W , Ax2) outputs of RPN, prob of bg or fg
                         NOTICE: the old version is ordered by (1, H, W, 2, A) !!!!
    rpn_bbox_pred: (1 , H , W , Ax4), rgs boxes output of RPN
    im_info: a list of [image_height, image_width, scale_ratios]
    cfg_key: 'TRAIN' or 'TEST'
    _feat_stride: the downsampling ratio of feature map to the original input image
    anchor_scales: the scales to the basic_anchor (basic anchor is [16, 16])
    ----------
    Returns
    ----------
    rpn_rois : (1 x H x W x A, 5) e.g. [0, x1, y1, x2, y2]

    """

    #注意在这个位置就生成了预置框了
    _anchors = generate_anchors(scales=np.array(anchor_scales))

    _num_anchors = _anchors.shape[0]
    im_info = im_info[0]

    assert rpn_cls_prob_reshape.shape[0] == 1, \
        'Only single item batches are supported'
    # cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'
    # cfg_key = 'TEST'
    pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
    post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N

    #这个阈值很重要
    nms_thresh = cfg[cfg_key].RPN_NMS_THRESH


    min_size = cfg[cfg_key].RPN_MIN_SIZE

    # the first set of _num_anchors channels are bg probs
    # the second set are the fg probs, which we want
    scores = rpn_cls_prob_reshape[:, _num_anchors:, :, :]
    bbox_deltas = rpn_bbox_pred
    # im_info = bottom[2].data[0, :]

    # 1. Generate proposals from bbox deltas and shifted anchors
    height, width = scores.shape[-2:]


    # Enumerate all shifts
    shift_x = np.arange(0, width) * _feat_stride
    shift_y = np.arange(0, height) * _feat_stride
    shift_x, shift_y = np.meshgrid(shift_x, shift_y)
    shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                        shift_x.ravel(), shift_y.ravel())).transpose()

    # Enumerate all shifted anchors:
    #
    # add A anchors (1, A, 4) to
    # cell K shifts (K, 1, 4) to get
    # shift anchors (K, A, 4)
    # reshape to (K*A, 4) shifted anchors
    A = _num_anchors
    K = shifts.shape[0]
    anchors = _anchors.reshape((1, A, 4)) + \
              shifts.reshape((1, K, 4)).transpose((1, 0, 2))
    anchors = anchors.reshape((K * A, 4))

    # Transpose and reshape predicted bbox transformations to get them
    # into the same order as the anchors:
    #
    # bbox deltas will be (1, 4 * A, H, W) format
    # transpose to (1, H, W, 4 * A)
    # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
    # in slowest to fastest order
    bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))

    # Same story for the scores:
    #
    # scores are (1, A, H, W) format
    # transpose to (1, H, W, A)
    # reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
    scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))

    # Convert anchors into proposals via bbox transformations
    proposals = bbox_transform_inv(anchors, bbox_deltas)

    # 2. clip predicted boxes to image
    proposals = clip_boxes(proposals, im_info[:2])

    # 3. remove predicted boxes with either height or width < threshold
    # (NOTE: convert min_size to input image scale stored in im_info[2])
    keep = _filter_boxes(proposals, min_size * im_info[2])
    proposals = proposals[keep, :]
    scores = scores[keep]

    # # remove irregular boxes, too fat too tall
    # keep = _filter_irregular_boxes(proposals)
    # proposals = proposals[keep, :]
    # scores = scores[keep]

    # 4. sort all (proposal, score) pairs by score from highest to lowest
    # 5. take top pre_nms_topN (e.g. 6000)
    order = scores.ravel().argsort()[::-1]
    if pre_nms_topN > 0:
        order = order[:pre_nms_topN]
    proposals = proposals[order, :]
    scores = scores[order]

    # 6. apply nms (e.g. threshold = 0.7)
    # 7. take after_nms_topN (e.g. 300)
    # 8. return the top proposals (-> RoIs top)
    keep = nms(np.hstack((proposals, scores)), nms_thresh)


    if post_nms_topN > 0:
        keep = keep[:post_nms_topN]
    proposals = proposals[keep, :]
    scores = scores[keep]

    # Output rois blob
    # Our RPN implementation only supports a single input image, so all
    # batch inds are 0
    batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
    blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
    return blob
    # top[0].reshape(*(blob.shape))
    # top[0].data[...] = blob

    # [Optional] output scores blob
    # if len(top) > 1:
    #    top[1].reshape(*(scores.shape))
    #    top[1].data[...] = scores
代码语言:javascript
复制
import numpy as np
sub_sample=16
ratio = [0.5, 1, 2]
anchor_scales = [8, 16, 32]
anchor_base = np.zeros((len(ratios) * len(scales), 4), dtype=np.float32)
ctr_y = sub_sample / 2.
ctr_x = sub_sample / 2.
print(ctr_y, ctr_x)
for i in range(len(ratios)):
  for j in range(len(anchor_scales)):
    h = sub_sample * anchor_scales[j] * np.sqrt(ratios[i])
    w = sub_sample * anchor_scales[j] * np.sqrt(1./ ratios[i])
    index = i * len(anchor_scales) + j
    anchor_base[index, 0] = ctr_y - h / 2.
    anchor_base[index, 1] = ctr_x - w / 2.
    anchor_base[index, 2] = ctr_y + h / 2.
    anchor_base[index, 3] = ctr_x + w / 2.
以上的输出为featuremap上第一个像素位置的anchor,
我们必须依照这个流程生成所有像素位置上的anchor:
2.在 feature map 上的每个像素位置,我们需要生成 9 个锚点框,既每个框由(‘y1’, ‘x1’, ‘y2’, ‘x2’)构成因此总共有 95050=22500 个框,因此最后,一张图的 anchor 数据尺寸应该是( 22500,4 )
3.在 22500 个框中最后有相当部分的框实质上超出了图像的边界,因此我们根据最直接的边界计算就能筛除,最终 22500 个框剩下 17500 个有效框( 17500,4 )
fe_size = (800//16)
ctr_x = np.arange(16, (fe_size+1) * 16, 16)
ctr_y = np.arange(16, (fe_size+1) * 16, 16)
index = 0
for x in range(len(ctr_x)):
    for y in range(len(ctr_y)):
        ctr[index, 1] = ctr_x[x] - 8
        ctr[index, 0] = ctr_y[y] - 8
        index +=1

anchors = np.zeros((fe_size * fe_size * 9), 4)
index = 0
for c in ctr:
  ctr_y, ctr_x = c
  for i in range(len(ratios)):
    for j in range(len(anchor_scales)):
      h = sub_sample * anchor_scales[j] * np.sqrt(ratios[i])
      w = sub_sample * anchor_scales[j] * np.sqrt(1./ ratios[i])
      anchors[index, 0] = ctr_y - h / 2.
      anchors[index, 1] = ctr_x - w / 2.
      anchors[index, 2] = ctr_y + h / 2.
      anchors[index, 3] = ctr_x + w / 2.
      index += 1

print(anchors.shape)
#Out: [22500, 4]
现在我们已经从原理上把所有框都生成了。为了工程调用起见我们用下面的方法包装
def _whctrs(anchor):
    """
    Return width, height, x center, and y center for an anchor (window).
    """

    w = anchor[2] - anchor[0] + 1
    h = anchor[3] - anchor[1] + 1
    x_ctr = anchor[0] + 0.5 * (w - 1)
    y_ctr = anchor[1] + 0.5 * (h - 1)
    return w, h, x_ctr, y_ctr

def _mkanchors(ws, hs, x_ctr, y_ctr):
    """
    Given a vector of widths (ws) and heights (hs) around a center
    (x_ctr, y_ctr), output a set of anchors (windows).
    """

    ws = ws[:, np.newaxis]
    hs = hs[:, np.newaxis]
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                         y_ctr - 0.5 * (hs - 1),
                         x_ctr + 0.5 * (ws - 1),
                         y_ctr + 0.5 * (hs - 1)))
    return anchors

def _ratio_enum(anchor, ratios):
    """
    Enumerate a set of anchors for each aspect ratio wrt an anchor.
    """

    w, h, x_ctr, y_ctr = _whctrs(anchor)
    size = w * h
    size_ratios = size / ratios
    ws = np.round(np.sqrt(size_ratios))
    hs = np.round(ws * ratios)
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

def _scale_enum(anchor, scales):
    """
    Enumerate a set of anchors for each scale wrt an anchor.
    """

    w, h, x_ctr, y_ctr = _whctrs(anchor)
    ws = w * scales
    hs = h * scales
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2**np.arange(3, 6)):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, 15, 15) window.
    """

    base_anchor = np.array([1, 1, base_size, base_size]) - 1
    ratio_anchors = _ratio_enum(base_anchor, ratios)
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in xrange(ratio_anchors.shape[0])])
    return anchors

print(anchors)
#array([[ -83.,  -39.,  100.,   56.],
#       [-175.,  -87.,  192.,  104.],
#       [-359., -183.,  376.,  200.],
#       [ -55.,  -55.,   72.,   72.],
#       [-119., -119.,  136.,  136.],
#       [-247., -247.,  264.,  264.],
#       [ -35.,  -79.,   52.,   96.],
#       [ -79., -167.,   96.,  184.],
#       [-167., -343.,  184.,  360.]])
代码语言:javascript
复制
import numpy as np
import matplotlib.pyplot as plt
import cv2
def display(cordlist):
    back=np.zeros((800,800,3),dtype=np.uint8)
    for index,cord in enumerate(cordlist):
        print('draw ',cord)

        color=(np.random.randint(127,255),np.random.randint(127,255),np.random.randint(127,255))
        print('color is ',color)

        cv2.rectangle(back, (int(cord[0]),int(cord[1])), 
        (int(cord[2]),int(cord[3])), color, 1)
        cv2.putText(back, str(cord[4]), (int(cord[0]),int(cord[1])),cv2.FONT_ITALIC,0.5,color, 1)
    plt.imshow(back),plt.show()
    return back



def py_cpu_nms(dets, thresh):
    """Pure Python NMS baseline."""
    # 所有图片的坐标信息,字典形式储存??
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)  # 计算出所有图片的面积
    order = scores.argsort()[::-1]  # 图片评分按升序排序

    keep = []  # 用来存放最后保留的图片的相应评分
    while order.size > 0:
        i = order[0]  # i 是还未处理的图片中的最大评分
        keep.append(i)  # 保留改图片的值
        # 矩阵操作,下面计算的是图片i分别与其余图片相交的矩形的坐标
        tmp=x1[order[1:]]
        xxxx = x1[i]
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        # 计算出各个相交矩形的面积
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        # 计算重叠比例
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        # 只保留比例小于阙值的图片,然后继续处理
        inds = np.where(ovr <= thresh)[0]
        indsd= inds+1
        order = order[inds + 1]

    return keep
boxes = np.array([[100, 100, 150, 168, 0.63],[166, 70, 312, 190, 0.55],[221, 250, 389, 500, 0.79],[12, 190, 300, 399, 0.9],[28, 130, 134, 302, 0.3]])
# boxes[:,:3]+=100
display(boxes)#显示NMS输入
thresh = 0.1
keep = py_cpu_nms(boxes, thresh)
print('keep:',keep)
keep_=boxes[keep]#nms之后的索引
display(keep_)#显示NMS输出

代码语言:javascript
复制
#torch.numel() 表示一个张量总元素的个数
#torch.clamp(min, max) 设置上下限
#tensor.item() 把tensor元素取出作为numpy数字

def nms(self, bboxes, scores, threshold=0.5):
        x1 = bboxes[:,0]
        y1 = bboxes[:,1]
        x2 = bboxes[:,2]
        y2 = bboxes[:,3]
        areas = (x2-x1)*(y2-y1)   # [N,] 每个bbox的面积
        _, order = scores.sort(0, descending=True)    # 降序排列

        keep = []
        while order.numel() > 0:       # torch.numel()返回张量元素个数
            if order.numel() == 1:     # 保留框只剩一个
                i = order.item()
                keep.append(i)
                break
            else:
                i = order[0].item()    # 保留scores最大的那个框box[i]
                keep.append(i)

            # 计算box[i]与其余各框的IOU(思路很好)
            xx1 = x1[order[1:]].clamp(min=x1[i])   # [N-1,]
            yy1 = y1[order[1:]].clamp(min=y1[i])
            xx2 = x2[order[1:]].clamp(max=x2[i])
            yy2 = y2[order[1:]].clamp(max=y2[i])
            inter = (xx2-xx1).clamp(min=0) * (yy2-yy1).clamp(min=0)   # [N-1,]

            iou = inter / (areas[i]+areas[order[1:]]-inter)  # [N-1,]
            idx = (iou <= threshold).nonzero().squeeze() # 注意此时idx为[N-1,] 而order为[N,]
            if idx.numel() == 0:
                break
            order = order[idx+1]  # 修补索引之间的差值
        return torch.LongTensor(keep)   # Pytorch的索引值为LongTensor


代码语言:javascript
复制

def build_loss(self, rpn_cls_score_reshape, rpn_bbox_pred, rpn_data):
        # classification loss
        rpn_cls_score = rpn_cls_score_reshape.permute(0, 2, 3, 1).contiguous().view(-1, 2)
        rpn_label = rpn_data[0].view(-1)

        rpn_keep = Variable(rpn_label.data.ne(-1).nonzero().squeeze()).cuda()
        rpn_cls_score = torch.index_select(rpn_cls_score, 0, rpn_keep)
        rpn_label = torch.index_select(rpn_label, 0, rpn_keep)

        fg_cnt = torch.sum(rpn_label.data.ne(0))

        rpn_cross_entropy = F.cross_entropy(rpn_cls_score, rpn_label)

        # box loss
        rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = rpn_data[1:]
        rpn_bbox_targets = torch.mul(rpn_bbox_targets, rpn_bbox_inside_weights)
        rpn_bbox_pred = torch.mul(rpn_bbox_pred, rpn_bbox_inside_weights)

        rpn_loss_box = F.smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, size_average=False) / (fg_cnt + 1e-4)

        return rpn_cross_entropy, rpn_loss_box
代码语言:javascript
复制

class RoIPoolFunction(Function):
    def __init__(self, pooled_height, pooled_width, spatial_scale):
        self.pooled_width = int(pooled_width)
        self.pooled_height = int(pooled_height)

        self.spatial_scale = float(spatial_scale)
        self.output = None
        self.argmax = None
        self.rois = None
        self.feature_size = None

    def forward(self, features, rois):
        batch_size, num_channels, data_height, data_width = features.size()
        num_rois = rois.size()[0]
        output = torch.zeros(num_rois, num_channels, self.pooled_height, self.pooled_width)
        argmax = torch.IntTensor(num_rois, num_channels, self.pooled_height, self.pooled_width).zero_()

        if not features.is_cuda:
            _features = features.permute(0, 2, 3, 1)
            roi_pooling.roi_pooling_forward(self.pooled_height, self.pooled_width, self.spatial_scale,
                                            _features, rois, output)
            # output = output.cuda()
        else:
            output = output.cuda()
            argmax = argmax.cuda()
            roi_pooling.roi_pooling_forward_cuda(self.pooled_height, self.pooled_width, self.spatial_scale,
                                                 features, rois, output, argmax)
            self.output = output
            self.argmax = argmax
            self.rois = rois
            self.feature_size = features.size()

        return output

    def backward(self, grad_output):
        assert(self.feature_size is not None and grad_output.is_cuda)

        batch_size, num_channels, data_height, data_width = self.feature_size

        grad_input = torch.zeros(batch_size, num_channels, data_height, data_width).cuda()
        roi_pooling.roi_pooling_backward_cuda(self.pooled_height, self.pooled_width, self.spatial_scale,
                                              grad_output, self.rois, grad_input, self.argmax)

        # print grad_input

        return grad_input, None
可以看到 ROI Pooling 的 Python 部分其实没有什么计算的部分,其计算部分都被隐藏在了 CUDA-C 以及 C 中,以保证其高效。我们对比一下 C 和CUDA 的代码 ( forward ) CUDA:
int roi_pooling_forward_cuda(int pooled_height, int pooled_width, float spatial_scale,
                        THCudaTensor * features, THCudaTensor * rois, THCudaTensor * output, THCudaIntTensor * argmax)
{
    // Grab the input tensor
    float * data_flat = THCudaTensor_data(state, features);//输入的Featuremap
    float * rois_flat = THCudaTensor_data(state, rois);//输入的多个ROI

    float * output_flat = THCudaTensor_data(state, output);
    int * argmax_flat = THCudaIntTensor_data(state, argmax);

    // Number of ROIs
    int num_rois = THCudaTensor_size(state, rois, 0);//ROI的数量
    int size_rois = THCudaTensor_size(state, rois, 1);//ROI的尺寸
    if (size_rois != 5)
    {
        return 0;
    }

    // batch size
    int batch_size = THCudaTensor_size(state, features, 0);
    if (batch_size != 1)
    {
        return 0;
    }
    // data height
    int data_height = THCudaTensor_size(state, features, 2);
    // data width
    int data_width = THCudaTensor_size(state, features, 3);
    // Number of channels
    int num_channels = THCudaTensor_size(state, features, 1);

    cudaStream_t stream = THCState_getCurrentStream(state);

    ROIPoolForwardLaucher(
        data_flat, spatial_scale, num_rois, data_height,
        data_width, num_channels, pooled_height,
        pooled_width, rois_flat,
        output_flat, argmax_flat, stream);

    return 1;
}
在这段代码中所有数据被整合进了ROI Pool Forward Launcher中,然后执行了
int ROIPoolForwardLaucher(
    const float* bottom_data, const float spatial_scale, const int num_rois, const int height,
    const int width, const int channels, const int pooled_height,
    const int pooled_width, const float* bottom_rois,
    float* top_data, int* argmax_data, cudaStream_t stream)
{
    const int kThreadsPerBlock = 1024;
    const int output_size = num_rois * pooled_height * pooled_width * channels;
    cudaError_t err;

    ////////////////////////////////////在此执行真正的forward计算
    ROIPoolForward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock, 0, stream>>>(
      output_size, bottom_data, spatial_scale, height, width, channels, pooled_height,
      pooled_width, bottom_rois, top_data, argmax_data);
    ////////////////////////////////////
    err = cudaGetLastError();
    if(cudaSuccess != err)
    {
        fprintf( stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString( err ) );
        exit( -1 );
    }

    return 1;
}
真正的 ROI Pooling Forward Kernal 计算:
__global__ void ROIPoolForward(const int nthreads, const float* bottom_data,
    const float spatial_scale, const int height, const int width,
    const int channels, const int pooled_height, const int pooled_width,
    const float* bottom_rois, float* top_data, int* argmax_data)
{
    CUDA_1D_KERNEL_LOOP(index, nthreads)
    {
        // (n, c, ph, pw) is an element in the pooled output
        int n = index;
        int pw = n % pooled_width;
        n /= pooled_width;
        int ph = n % pooled_height;
        n /= pooled_height;
        int c = n % channels;
        n /= channels;

        bottom_rois += n * 5;
        int roi_batch_ind = bottom_rois[0];
        int roi_start_w = round(bottom_rois[1] * spatial_scale);
        int roi_start_h = round(bottom_rois[2] * spatial_scale);
        int roi_end_w = round(bottom_rois[3] * spatial_scale);
        int roi_end_h = round(bottom_rois[4] * spatial_scale);

        // Force malformed ROIs to be 1x1
        int roi_width = fmaxf(roi_end_w - roi_start_w + 1, 1);
        int roi_height = fmaxf(roi_end_h - roi_start_h + 1, 1);
        float bin_size_h = (float)(roi_height) / (float)(pooled_height);
        float bin_size_w = (float)(roi_width) / (float)(pooled_width);

        int hstart = (int)(floor((float)(ph) * bin_size_h));
        int wstart = (int)(floor((float)(pw) * bin_size_w));
        int hend = (int)(ceil((float)(ph + 1) * bin_size_h));
        int wend = (int)(ceil((float)(pw + 1) * bin_size_w));

        // Add roi offsets and clip to input boundaries
        hstart = fminf(fmaxf(hstart + roi_start_h, 0), height);
        hend = fminf(fmaxf(hend + roi_start_h, 0), height);
        wstart = fminf(fmaxf(wstart + roi_start_w, 0), width);
        wend = fminf(fmaxf(wend + roi_start_w, 0), width);
        bool is_empty = (hend <= hstart) || (wend <= wstart);

        // Define an empty pooling region to be zero
        float maxval = is_empty ? 0 : -FLT_MAX;
        // If nothing is pooled, argmax = -1 causes nothing to be backprop'd
        int maxidx = -1;
        bottom_data += roi_batch_ind * channels * height * width;
        for (int h = hstart; h < hend; ++h) {
            for (int w = wstart; w < wend; ++w) {
    //            int bottom_index = (h * width + w) * channels + c;
                int bottom_index = (c * height + h) * width + w;
                if (bottom_data[bottom_index] > maxval) {
                    maxval = bottom_data[bottom_index];
                    maxidx = bottom_index;
                }
            }
        }
        top_data[index] = maxval;
        if (argmax_data != NULL)
            argmax_data[index] = maxidx;
    }
}
代码语言:javascript
复制
class Faster_RCNN(nn.Module)
    def __init__(self, classes=None, debug=False):
        super(FasterRCNN, self).__init__()
        # roi pooling 之前的定义部分省略

        if classes is not None:
            self.classes = classes
            self.n_classes = len(classes)

        self.rpn = RPN()
        self.roi_pool = RoIPool(7, 7, 1.0/16)
        self.fc6 = FC(512 * 7 * 7, 4096)
        self.fc7 = FC(4096, 4096)
        self.score_fc = FC(4096, self.n_classes, relu=False)
        self.bbox_fc = FC(4096, self.n_classes * 4, relu=False)


    def forward(self, im_data, im_info, gt_boxes=None, gt_ishard=None, dontcare_areas=None):

    ......roi之前的部分省略

        pooled_features = self.roi_pool(features, rois)


        x = pooled_features.view(pooled_features.size()[0], -1)
        x = self.fc6(x)
        x = F.dropout(x, training=self.training)
        x = self.fc7(x)
        x = F.dropout(x, training=self.training)

        cls_score = self.score_fc(x)
        cls_prob = F.softmax(cls_score)
        bbox_pred = self.bbox_fc(x)

        return cls_prob, bbox_pred, rois
代码语言:javascript
复制
def build_loss(self, cls_score, bbox_pred, roi_data):
        # classification loss
        label = roi_data[1].squeeze()
        fg_cnt = torch.sum(label.data.ne(0))
        bg_cnt = label.data.numel() - fg_cnt

        # for log
        if self.debug:
            maxv, predict = cls_score.data.max(1)
            self.tp = torch.sum(predict[:fg_cnt].eq(label.data[:fg_cnt])) if fg_cnt > 0 else 0
            self.tf = torch.sum(predict[fg_cnt:].eq(label.data[fg_cnt:]))
            self.fg_cnt = fg_cnt
            self.bg_cnt = bg_cnt

        ce_weights = torch.ones(cls_score.size()[1])
        ce_weights[0] = float(fg_cnt) / bg_cnt
        ce_weights = ce_weights.cuda()
        cross_entropy = F.cross_entropy(cls_score, label, weight=ce_weights)

        # bounding box regression L1 loss
        bbox_targets, bbox_inside_weights, bbox_outside_weights = roi_data[2:]
        bbox_targets = torch.mul(bbox_targets, bbox_inside_weights)
        bbox_pred = torch.mul(bbox_pred, bbox_inside_weights)

        loss_box = F.smooth_l1_loss(bbox_pred, bbox_targets, size_average=False) / (fg_cnt + 1e-4)

        return cross_entropy, loss_box
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-08-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 AI科技大本营 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Faster RCNN 复现
    • 需求:
      • 我们的目标:尝试用最简洁,最贴合原文得写法复现 Resnet - Faster R-CNN
        • 架构
          • 0.整体流程
            • 1.DataLoader
              • Data2Dataset
                • Dataset2Dataloader
                  • 2.BackBone - Resnet/VGG
                    • 原始 VGG 网络和 Resnet34 的对比
                      • 1. BasicBlock
                        • 适配工作 AnyFeature to RPN
                        相关产品与服务
                        私有网络
                        私有网络(Virtual Private Cloud,VPC)是基于腾讯云构建的专属云上网络空间,为您在腾讯云上的资源提供网络服务,不同私有网络间完全逻辑隔离。作为您在云上的专属网络空间,您可以通过软件定义网络的方式管理您的私有网络 VPC,实现 IP 地址、子网、路由表、网络 ACL 、流日志等功能的配置管理。私有网络还支持多种方式连接 Internet,如弹性 IP 、NAT 网关等。同时,您也可以通过 VPN 连接或专线接入连通腾讯云与您本地的数据中心,灵活构建混合云。
                        领券
                        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档