部署架构已固定，如何无痛涨点？

灿视学长

发布于 2021-07-30 15:39:39

6090

发布于 2021-07-30 15:39:39

文章被收录于专栏：灿视学长

大家好，我是灿视。昨天好兄弟发了一篇文章，文章是：“教你如何更好的训练目标检测模型”。

安利各位可以点击链接，去看看好兄弟总结的那篇文章。该篇文章主要从数据输入的数据增强、网络结构、检测head等多方面进行了实验。其次，也主要针对分辨率大小以及rpn中proposals的个数做了一些对比实验。

详细解读 | Google与Waymo教你如何更好的训练目标检测模型！！！

这里，我又回忆起之前面试的一家公司，该家公司问过我一道题。“如一个分类网络，当我们网络的部署架构已经固定了，那么我们怎么增强网络的性能，来涨点呢？比如分类的

accuracy

呢？”

部署架构固定，如何增强网络提取特征的能力？

为什么会有这样的需求呢？

因为需要考虑到部署！

举例子，针对细粒度分类，在大类中进行小类的分类，像是自然界的猫狗分类。那么对狗再进行二哈，柯基这样的分类，就属于细粒度分类。而我目前做的农业领域的分类工作，就是属于细粒度的任务。

目前有一些方案，是采用

Attention

的方式来找到区域，再进行各种骚操作。但是目前有一个问题，

Attention

这种算

Sigmoid

或者

Softmax

的计算量很大，如果有很多这样的操作，部署这块就很难做了。

同理，如模型集成等工作，都是在部署方面增加很多工作量的工作，或者就是无法进行的工作。因此，现在我更多的只能在

ResNet

这样的网络中进行操作，这样的话，在部署的时候就会省去很多事情。

在这里我也想给出一些我自己的一些建议，帮助各位可以更好的摸鱼，毕竟不改部署这块的代码，还是很舒服的。

通用的Trick方案

这个可以主要参考李沐老师的《Bag of tricks for Convolution Neural Network 》。

使用低精度浮点数(如

fp16

)和适当大点的

batch

size

来训练.

Mini

batch

SGD

之所以用多个图片组成

batch

来训练，为的是提升计算的并行性和减少数据通信带来的

overhead

。但是，太大的

batch

size

也不一定好，因为对于凸优化问题，优化过程收敛的速率（而不是收敛的结果！）会随着

batch

size

的增大而降低。即，同样数量的

epoch

前提下，大

batch

size

训练的模型验证机精度要比小

batch

size

训练的模型差。 当采用大

batch

的时候，也需要做相对应的参数修改：

初始学习率线性缩放 随着

batch

size

的增大，可以线性增大学习率。对于初始学习率，这里有一个参考的公式

lr = 0.1 * bs / 256

。

学习率预热到初始学习率 在网络开始学习的时候，权重更新的梯度很大，如果一开始就用很大的学习率，很可能造成训练中数值不稳定。所以，最开始训练的时候，应该用比较小的学习率，然后训练过程稳定之后切回到最初的初始学习率。假设我们用前

个

batch

做这个预热的过程，初始学习率设置为

\eta

，那么在

batch

i(1 ≤ i ≤ m)

的时候，学习率设置为

lr = i * \eta/m

，让学习率逐步增大到初始学习率

\eta

；

BN层

\gamma

用零初始化

ResNet

的残差块中，非等量映射的那一支的最后一层可能是

层。在常规初始化策略中，这个

层的

\gamma

和

\beta

一般分别初始化为

和

，但是如果把

\gamma

初始化为

的话，那样残差块就相当于没有了，整个网络的层数也相当于减少了。

取消bias decay 只在卷积层和全连接层做weight decay(L2正则化)，其他参数，例如

bias

和

层的

\gamma

和

\beta

不要做正则。

余弦学习率或者Step decay训练中学习率的调整策略是至关重要的。按照一定比例在一定数量

epoch

后缩小学习率的

step

decay

这种策略最为常见如果一轮

epoch

中有

个

mini

batch

，那么在第

个

mini

batch

时的学习率

\eta_{t}=\frac{1}{2}\left(1+\cos \left(\frac{t \pi}{\pi}\right)\right) \eta

标签平滑(label smoothing) 可以参考我们之前的文章：

理论与举例，说明标签平滑有效！

l1、l2正则 可以参考我们之前的文章：

你够全面了解L1与L2正则吗？

各种

Drop

方案同样，可以参考我们之前的文章，目前我们也整理了最全的

Drop

方案。出一道题给你们做一下吧，是一个师妹面试腾讯的时候被问到的。

Dropout

与

能否一起用？为什么呢？

答案在下面的两篇文章中：

我丢！算法岗必问！建议收藏！

我再丢！算法必问！

数据输入角度

1. 传统的数据增强

这个基本上就是训练网络的标配了，像是随机左右翻转、随机上下翻转、对比度增强、旋转一定角度的操作了。

from torchvision import transforms
trans = transforms.Compose([
    transforms.CenterCrop(10),
    transforms.ToTensor(),
])

'''
其他常用的数据增强的方法：
Resize：把给定的图片resize到given size
Normalize：Normalized an tensor image with mean and standard deviation
ToTensor：convert a PIL image to tensor (H*W*C) in range [0,255] to a torch.Tensor(C*H*W) in the range [0.0,1.0]
ToPILImage: convert a tensor to PIL image
Scale：目前已经不用了，推荐用Resize
CenterCrop：在图片的中间区域进行裁剪
RandomCrop：在一个随机的位置进行裁剪
RandomHorizontalFlip：以0.5的概率水平翻转给定的PIL图像
RandomVerticalFlip：以0.5的概率竖直翻转给定的PIL图像
RandomResizedCrop：将PIL图像裁剪成任意大小和纵横比
Grayscale：将图像转换为灰度图像
RandomGrayscale：将图像以一定的概率转换为灰度图像
FiceCrop：把图像裁剪为四个角和一个中心
Pad：填充
ColorJitter：随机改变图像的亮度对比度和饱和度。
'''

2. mixup/cutmix/Mosaic/Cutout/Random Erase等数据增强方式

mixup

先上代码，之前面试的时候，被要求过写这些代码：

def mixup_data(x, y, alpha=1.0, use_cuda=True):

    '''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda'''
    if alpha > 0.:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1.
    batch_size = x.size()[0]
    if use_cuda:
        index = torch.randperm(batch_size).cuda()
    else:
        index = torch.randperm(batch_size)

    mixed_x = lam * x + (1 - lam) * x[index,:]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

由代码看出，

mixup

data

并不是同时取出两个

batch

，而是取一个batch，并将该

batch

中的样本

顺序打乱（

shuffle

），然后再进行加权求和。

具体步骤如下：

对于输入的一个

batch

的待测图片

images

，我们将其和随机抽取的图片进行融合，融合比例为

lam

，得到混合张量

inputs

；

第1步中图片融合的比例

lam

是

[0,1]

之间的随机实数，符合

beta

分布，相加时两张图对应的每个像素值直接相加，即

inputs = lam*images

(1-lam)

images

random

；将1中得到的混合张量

inputs

传递给

model

得到输出张量

outpus

，

随后计算损失函数时，我们针对两个图片的标签分别计算损失函数，然后按照比例

lam

进行损失函数的加权求和，即

loss

lam

criterion(outputs,

targets

(1 - lam) * criterion(outputs,

targets

；

反向求导更新参数。

参考代码：

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=base_learning_rate, momentum=0.9, weight_decay=args.decay)

def mixup_criterion(y_a, y_b, lam):
    return lambda criterion, pred: lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)
    
""" 训练 """

def train(epoch):
    print('\nEpoch: %d' % epoch)
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        if use_cuda:
            inputs, targets = inputs.cuda(), targets.cuda()
        """ generate mixed inputs, two one-hot label vectors and mixing coefficient """
        inputs, targets_a, targets_b, lam = mixup_data(inputs, targets, args.alpha, use_cuda)       
        inputs, targets_a, targets_b = Variable(inputs), Variable(targets_a), Variable(targets_b)
        outputs = net(inputs)
        """ 计算loss """
        loss_func = mixup_criterion(targets_a, targets_b, lam)
        loss = loss_func(criterion, outputs)
        """ 更新梯度 """
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

cutmix 先看下效果是什么样：

流程：对一对图片做操作，简单讲就是随机生成一个裁剪框

Box

,裁剪掉

图的相应位置，然后用

图片相应位置的

ROI

放到

图中被裁剪的区域形成新的样本，计算损失时同样采用加权求和的方式进行求解。

两张图合并操作定义如下：

\begin{array}{l} \tilde{x}=\mathbf{M} \odot x_{A}+(\mathbf{1}-\mathbf{M}) \odot x_{B} \\ \tilde{y}=\lambda y_{A}+(1-\lambda) y_{B}, \end{array}

其中，

表示二进制

0,1

矩阵，表示从两个图像中删除并填充的位置，实际就是用来标记需要裁剪的区域和保留的区域，裁剪的区域值均为

，其余位置为

。

是所有元素都是

的矩阵，维度大小与M相同。图像

和

组合得到新样本，最后两个图的标签也对应求加权和。权值同

mixup

一样是采用

bata

分布随机得到，

alpha

的值为论文中取值为

，这样加权系数就服从

beta

分布。

CutMix

主要是用另一个训练图像中的补丁替换了图像区域，并且比

Mixup

生成了更多的本地自然图像。

import numpy as np
"""输入为：样本的size和生成的随机lamda值"""
def rand_bbox(size, lam):
    W = size[2]
    H = size[3]
    """论文里的公式2，求出B的rw,rh"""
    cut_rat = np.sqrt(1. - lam)
    cut_w = np.int(W * cut_rat)
    cut_h = np.int(H * cut_rat)

    # uniform
    """论文里的公式2，求出B的rx,ry（bbox的中心点）"""
    cx = np.random.randint(W)
    cy = np.random.randint(H)
 
 # np.clip限制大小
 """限制B坐标区域不超过样本大小"""
    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)

    return bbx1, bby1, bbx2, bby2
    
for i, (input, target) in enumerate(train_loader):
        # measure data loading time
        data_time.update(time.time() - end)

        input = input.cuda()
        target = target.cuda()

        r = np.random.rand(1)
        if args.beta > 0 and r < args.cutmix_prob:
            # generate mixed sample
            """设定lamda的值，服从beta分布"""
            lam = np.random.beta(args.beta, args.beta)
            rand_index = torch.randperm(input.size()[0]).cuda()
            """获取batch里面的两个随机样本 """
            target_a = target
            target_b = target[rand_index]
            """获取裁剪区域bbox坐标位置 """
            bbx1, bby1, bbx2, bby2 = rand_bbox(input.size(), lam)
            """将原有的样本A中的B区域，替换成样本B中的B区域"""
            input[:, :, bbx1:bbx2, bby1:bby2] = input[rand_index, :, bbx1:bbx2, bby1:bby2]
            # adjust lambda to exactly match pixel ratio
            """根据剪裁区域坐标框的值调整lam的值 """ 
            lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (input.size()[-1] * input.size()[-2]))
            # compute output
            """计算模型输出 """
            output = model(input)
            """计算损失 """
            loss = criterion(output, target_a) * lam + criterion(output, target_b) * (1. - lam)
        else:
            # compute output
            output = model(input)
            loss = criterion(output, target)

Mosaic

Mosaic

数据增强方法主要思想是将四张图片进行随机裁剪，再拼接到一张图上作为训练数据。这样做的好处是丰富了图片的背景，并且四张图片拼接在一起变相地提高了

batch

size

，在进行

batch

normalization

的时候也会计算四张图片，所以对本身

batch

size

不是很依赖。

如图所示，就是

Mosaic

的一种情况，根据边界，需要修改代码。

这里，又是大佬们的代码：


import os
import numpy as np
import cv2
import random
import math
 
def random_affine(img, targets=(), degrees=10, translate=.1, scale=.1, shear=10, border=0):
    # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(.1, .1), scale=(.9, 1.1), shear=(-10, 10))
    # https://medium.com/uruvideo/dataset-augmentation-with-random-homographies-a8f4b44830d4
    # targets = [cls, xyxy]
 
    height = img.shape[0] + border * 2
    width = img.shape[1] + border * 2
 
    # Rotation and Scale
    R = np.eye(3)
    a = random.uniform(-degrees, degrees)
    # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations
    s = random.uniform(1 - scale, 1 + scale)
    # s = 2 ** random.uniform(-scale, scale)
    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(img.shape[1] / 2, img.shape[0] / 2), scale=s)
 
    # Translation
    T = np.eye(3)
    T[0, 2] = random.uniform(-translate, translate) * img.shape[0] + border  # x translation (pixels)
    T[1, 2] = random.uniform(-translate, translate) * img.shape[1] + border  # y translation (pixels)
 
    # Shear
    S = np.eye(3)
    S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)
    S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)
 
    # Combined rotation matrix
    M = S @ T @ R  # ORDER IS IMPORTANT HERE!!
    if (border != 0) or (M != np.eye(3)).any():  # image changed
        img = cv2.warpAffine(img, M[:2], dsize=(width, height), flags=cv2.INTER_LINEAR, borderValue=(114, 114, 114))
 
    # Transform label coordinates
    n = len(targets)
    if n:
        # warp points
        xy = np.ones((n * 4, 3))
        xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
        xy = (xy @ M.T)[:, :2].reshape(n, 8)
        # create new boxes
        x = xy[:, [0, 2, 4, 6]]
        y = xy[:, [1, 3, 5, 7]]
        xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
 
        # # apply angle-based reduction of bounding boxes
        # radians = a * math.pi / 180
        # reduction = max(abs(math.sin(radians)), abs(math.cos(radians))) ** 0.5
        # x = (xy[:, 2] + xy[:, 0]) / 2
        # y = (xy[:, 3] + xy[:, 1]) / 2
        # w = (xy[:, 2] - xy[:, 0]) * reduction
        # h = (xy[:, 3] - xy[:, 1]) * reduction
        # xy = np.concatenate((x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T
 
        # reject warped points outside of image
        xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)
        xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)
        w = xy[:, 2] - xy[:, 0]
        h = xy[:, 3] - xy[:, 1]
        area = w * h
        area0 = (targets[:, 3] - targets[:, 1]) * (targets[:, 4] - targets[:, 2])
        ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16))  # aspect ratio
        i = (w > 4) & (h > 4) & (area / (area0 * s + 1e-16) > 0.2) & (ar < 10)
 
        targets = targets[i]
        targets[:, 1:5] = xy[i]
 
    return img, targets
 
def load_image(img_files, index,img_size=640):
    # loads 1 image from dataset, returns img, original hw, resized hw
    path = img_files[index]
    img = cv2.imread(path)  # BGR
    assert img is not None, 'Image Not Found ' + path
    h0, w0 = img.shape[:2]  # orig hw
    r = img_size / max(h0, w0)  # resize image to img_size
    if r != 1:  # always resize down, only resize up if training with augmentation
        img = cv2.resize(img, (int(w0 * r), int(h0 * r)), interpolation=1)
    return img, (h0, w0), img.shape[:2]  # img, hw_original, hw_resized
 
def load_mosaic(img_files,index,img_size,labels):
    # loads images in a mosaic
    labels4 = []
    s = img_size
    xc, yc = [int(random.uniform(s * 0.5, s * 1.5)) for _ in range(2)]  # mosaic center x, y
    indices = [index] + [random.randint(0, len(labels) - 1) for _ in range(3)]  # 3 additional image indices
    for i, index in enumerate(indices):
        # Load image
        img, _, (h, w) = load_image(img_files, index)
        # place img in img4
        if i == 0:  # top left
            img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
            x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
            x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
        elif i == 1:  # top right
            x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
            x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
        elif i == 2:  # bottom left
            x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
            x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, max(xc, w), min(y2a - y1a, h)
        elif i == 3:  # bottom right
            x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
            x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
        img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]
        padw = x1a - x1b
        padh = y1a - y1b
 
        # Labels
        x = labels[index]
        labels_ = x.copy()
        if x.size > 0:  # Normalized xywh to pixel xyxy format
            labels_[:, 1] = w * (x[:, 1] - x[:, 3] / 2) + padw
            labels_[:, 2] = h * (x[:, 2] - x[:, 4] / 2) + padh
            labels_[:, 3] = w * (x[:, 1] + x[:, 3] / 2) + padw
            labels_[:, 4] = h * (x[:, 2] + x[:, 4] / 2) + padh
        labels4.append(labels_)
 
    # Concat/clip labels
    if len(labels4):
        labels4 = np.concatenate(labels4, 0)
        # np.clip(labels4[:, 1:] - s / 2, 0, s, out=labels4[:, 1:])  # use with center crop
        np.clip(labels4[:, 1:], 0, 2 * s, out=labels4[:, 1:])  # use with random_affine
 
    # Augment
    # img4 = img4[s // 2: int(s * 1.5), s // 2:int(s * 1.5)]  # center crop (WARNING, requires box pruning)
    img4, labels4 = random_affine(img4, labels4,
                                  degrees=0.0,
                                  translate=0.0,
                                  scale=0.5,
                                  shear=0.0,
                                  border=-s // 2)  # border to remove
    return img4, labels4
 
img_files = []#图片路径列表
labelss = []#[类别，中心点x的归一化,中心点y的归一化,w的归一化,h的归一化]
for dir in os.listdir(r'G:\dirsfirst\data\dataset\labels'):
    name = dir.replace('txt','jpg')
    path = os.path.join(r'G:\dirsfirst\data\dataset\images',name)
    img_files.append(path)
    with open(os.path.join(r'G:\dirsfirst\data\dataset\labels',dir)) as f:
        l = [0]
        data = f.read().strip().split(' ')
        for d in data[1:]:
            l.append(float(d))
        labelss.append(np.array([l]))
for index in range(len(labelss)):
    img, labels =load_mosaic(img_files=img_files,index=index,img_size=640,labels=labelss)
    for label in labels:
        x1 = int(label[1])
        y1 = int(label[2])
        x2 = int(label[3])
        y2 = int(label[4])
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
    cv2.imshow('',img)
    cv2.waitKey(0)

Cutout 主要效果如图所示：

参考代码：

class Cutout(object):
    """Randomly mask out one or more patches from an image.
    Args:
        n_holes (int): Number of patches to cut out of each image.
        length (int): The length (in pixels) of each square patch.
    """
    def __init__(self, n_holes, length):
        self.n_holes = n_holes
        self.length = length

    def __call__(self, img):
        """
        Args:
            img (Tensor): Tensor image of size (C, H, W).
        Returns:
            Tensor: Image with n_holes of dimension length x length cut out of it.
        """
        h = img.size(1)
        w = img.size(2)

        mask = np.ones((h, w), np.float32)

        for n in range(self.n_holes):
         # (x,y)表示方形补丁的中心位置
            y = np.random.randint(h)
            x = np.random.randint(w)

            y1 = np.clip(y - self.length // 2, 0, h)
            y2 = np.clip(y + self.length // 2, 0, h)
            x1 = np.clip(x - self.length // 2, 0, w)
            x2 = np.clip(x + self.length // 2, 0, w)

            mask[y1: y2, x1: x2] = 0.

        mask = torch.from_numpy(mask)
        mask = mask.expand_as(img)
        img = img * mask

        return img

2. 基于CAM的擦出方案

这里可以推荐几篇讲

CAM

的文章(参考链接第三个)。什么是

CAM

？我们先看下可视化的图片：

CAM(class activation mapping)即图中最具有判别信息的区域。目前的

CAM

主要也分为以下几类：

gradient-based:
- Grad-CAM (2016.10)
- Grad-CAM++ (2017.10)
- Smooth Grad-CAM++ (2019.08)
gradient-free:
- CAM (2015.12)
- score-CAM (2019.10)
- ss-CAM (2020.06)
- Ablation-CAM (2020)

目前的擦除方案主要包括两种：

对原始图片擦除 这类似一种数据增强，即让网络可以关注到别的图像区域。
对特征图进行擦除 利用网络预测的信息进行擦除，类似一种注意力机制反着使用。比如网络输入为原始图片，训练当中利用实时的

CAM

擦除，再进行

loss

计算。这种训练-擦除-再训练-再擦除的方式通常被称为对抗擦除。

主要分享几篇文章, 来提高骨干网络的性能：

AE-PSL

这篇文章主要是采用分类标签来做弱监督分割，最后也得到了一个不错的效果。不断的递归使用前一个阶段被擦出的区域来做下一个阶段的输入。我这边实验过将其用来做分类，最后结果有提升。

ACoL

方法如上图所示。假设有分类器

和

，先训练

分类，然后找出对应的类别的

feature

map

，然后在训练

过程中把这部分

feature-map

擦除(

代替)，因为有监督训练

，因此

可以再学到该类别的其他区域。当然，

与

可以是一个网络，即将擦出后的结果，再继续让网络学习。

同样，这是一篇采用分类标签来弱监督做检测的任务。结合空间注意⼒和特征擦除操作，在每⼀层卷积上插⼊该模块,没有可学习的参数。训练过程，会随机选择使⽤

importance

map

或

drop

mask

作⽤到特征图上。我们可以将这样的方案，引入到我们自己的分类网络中去。

3. 基于Attention的提取与擦除

这里主要是在训练的时候，采用

Attention

的方式来做擦出，主要我在我的项目中，我试过

WS-DAN

（https://arxiv.org/abs/1901.09891）的方式。

如图所示，可以在学习的时候，学习到网络的最具判别信息的躯干信息。所以在提取这块，可以与随机提取相比，得到更好的部位：

同理，在擦除时，也可以擦到关键信息。

网络方面

1. 采用重参数技巧，修改网络

如

Rep-VGG

就是训练的时候，采用不同

Size

的核进行特征提取，在测试的时候，可以进行融合，即部署的时候，是不需要动代码的。

重参数技巧与

Rep-VGG

可以参考我们之前的文章：

深度学习中的重参数技巧

2. 添加更多合适的Loss约束

举例子如：

DCL

这篇文章。其训练的过程如下所示：

采用了拼图的技巧，得到了相对应的坐标信息，再额外添加了坐标预测与判断是否拼接的

Loss

,最后在测试的时候，这些分支是去掉的，只保留了

BackBone

的部分，非常好用！！！！

总结

本篇文章主要是针对如分类网络，在部署架构不变的情况下，如何提高网络提取特征的能力，给出了一些自己在工程上的摸索。不一定都正确，也希望各位可以补充你的方案。Respect！

参考链接

https://blog.csdn.net/c2250645962/article/details/106193051
https://zhuanlan.zhihu.com/p/83456995
(CAM): https://zhuanlan.zhihu.com/p/269702192
(ADL):https://arxiv.org/abs/1908.10028
(AE-PSL)：https://arxiv.org/pdf/1703.08448.pdf
(ACoL): https://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Adversarial_Complementary_Learning_CVPR_2018_paper.pdf
(DCL)https://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Destruction_and_Construction_Learning_for_Fine-Grained_Image_Recognition_CVPR_2019_paper.pdf

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-13，如有侵权请联系 cloudcommunity@tencent.com 删除

访问管理