前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >目标检测基本概念与性能评价指标计算

目标检测基本概念与性能评价指标计算

作者头像
嵌入式视觉
发布2022-09-05 14:23:35
7670
发布2022-09-05 14:23:35
举报
文章被收录于专栏:嵌入式视觉

Contents

前言

不同的问题和不同的数据集都会有不同的模型评价指标,比如分类问题,数据集类别平衡的情况下可以使用准确率作为评价指标,但是现实中的数据集几乎都是类别不平衡的,所以一般都是采用 AP 作为评价指标,分别计算每个类别的 AP,再计算mAP

anchors

所谓 anchors,实际上就是一组由 generate_anchors.py 生成的矩形框。其中每行的4个值 (x1,y1,x2,y2) 表矩形左上和右下角点坐标。9个矩形共有3种形状,长宽比为大约为 {1:1, 1:2, 2:1} 三种, 实际上通过anchors就引入了检测中常用到的多尺度方法。generate_anchors.py的代码如下:

代码语言:javascript
复制
import numpy as np
import six
from six import __init__  # 兼容python2和python3模块


def generate_anchor_base(base_size=16, ratios=[0.5, 1, 2],
                         anchor_scales=[8, 16, 32]):
    """Generate anchor base windows by enumerating aspect ratio and scales.

    Generate anchors that are scaled and modified to the given aspect ratios.
    Area of a scaled anchor is preserved when modifying to the given aspect
    ratio.

    :obj:`R = len(ratios) * len(anchor_scales)` anchors are generated by this
    function.
    The :obj:`i * len(anchor_scales) + j` th anchor corresponds to an anchor
    generated by :obj:`ratios[i]` and :obj:`anchor_scales[j]`.

    For example, if the scale is :math:`8` and the ratio is :math:`0.25`,
    the width and the height of the base window will be stretched by :math:`8`.
    For modifying the anchor to the given aspect ratio,
    the height is halved and the width is doubled.

    Args:
        base_size (number): The width and the height of the reference window.
        ratios (list of floats): This is ratios of width to height of
            the anchors.
        anchor_scales (list of numbers): This is areas of anchors.
            Those areas will be the product of the square of an element in
            :obj:`anchor_scales` and the original area of the reference
            window.

    Returns:
        ~numpy.ndarray:
        An array of shape :math:`(R, 4)`.
        Each element is a set of coordinates of a bounding box.
        The second axis corresponds to
        :math:`(x_{min}, y_{min}, x_{max}, y_{max})` of a bounding box.

    """
    import numpy as np
    py = base_size / 2.
    px = base_size / 2.

    anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
                           dtype=np.float32)
    for i in six.moves.range(len(ratios)):
        for j in six.moves.range(len(anchor_scales)):
            h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
            w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])

            index = i * len(anchor_scales) + j
            anchor_base[index, 0] = px - w / 2.
            anchor_base[index, 1] = py - h / 2.

            anchor_base[index, 2] = px + h / 2.
            anchor_base[index, 3] = py + w / 2.
    return anchor_base


# test
if __name__ == "__main__":
    bbox_list = generate_anchor_base()
    print(bbox_list)

程序运行输出如下:

[[ -82.50967 -37.254833 53.254833 98.50967 ] [-173.01933 -82.50967 98.50967 189.01933 ] [-354.03867 -173.01933 189.01933 370.03867 ] [ -56. -56. 72. 72. ] [-120. -120. 136. 136. ] [-248. -248. 264. 264. ] [ -37.254833 -82.50967 98.50967 53.254833] [ -82.50967 -173.01933 189.01933 98.50967 ] [-173.01933 -354.03867 370.03867 189.01933 ]]

交并比IOU

交并比(Intersection-over-Union,IoU),目标检测中使用的一个概念,是产生的候选框(candidate bound)与原标记框(ground truth bound)的交叠率,即它们的交集与并集的比值。最理想情况是完全重叠,即比值为1。 计算公式如下:

IOU计算公式

IOU计算代码实现如下:

代码语言:javascript
复制
# _*_ coding:utf-8 _*_
# 计算iou

"""
bbox的数据结构为(xmin,ymin,xmax,ymax)--(x1,y1,x2,y2),
每个bounding box的左上角和右下角的坐标
输入:
    bbox1, bbox2: Single numpy bounding box, Shape: [4]
输出:
    iou值
"""
import numpy as np
import cv2

def iou(bbox1, bbox2):
    """
    计算两个bbox(两框的交并比)的iou值
    :param bbox1: (x1,y1,x2,y2), type: ndarray or list
    :param bbox2: (x1,y1,x2,y2), type: ndarray or list
    :return: iou, type float
    """
    if type(bbox1) or type(bbox2) != 'ndarray':
        bbox1 = np.array(bbox1)
        bbox2 = np.array(bbox2)

    assert bbox1.size == 4 and bbox2.size == 4, "bounding box coordinate size must be 4"
    xx1 = np.max((bbox1[0], bbox2[0]))
    yy1 = np.max((bbox1[1], bbox1[1]))
    xx2 = np.min((bbox1[2], bbox2[2]))
    yy2 = np.min((bbox1[3], bbox2[3]))
    bwidth = xx2 - xx1
    bheight = yy2 - yy1
    area = bwidth * bheight  # 求两个矩形框的交集
    union = (bbox1[2] - bbox1[0])*(bbox1[3] - bbox1[1]) + (bbox2[2] - bbox2[0])*(bbox2[3] - bbox2[1]) - area  # 求两个矩形框的并集
    iou = area / union

    return iou


if __name__=='__main__':
    rect1 = (461, 97, 599, 237)
    # (top, left, bottom, right)
    rect2 = (522, 127, 702, 257)
    iou_ret = round(iou(rect1, rect2), 3) # 保留3位小数
    print(iou_ret)

    # Create a black image
    img=np.zeros((720,720,3), np.uint8)
    cv2.namedWindow('iou_rectangle')
    """
    cv2.rectangle 的 pt1 和 pt2 参数分别代表矩形的左上角和右下角两个点,
    coordinates for the bounding box vertices need to be integers if they are in a tuple,
    and they need to be in the order of (left, top) and (right, bottom). 
    Or, equivalently, (xmin, ymin) and (xmax, ymax).
    """
    cv2.rectangle(img,(461, 97),(599, 237),(0,255,0),3)
    cv2.rectangle(img,(522, 127),(702, 257),(0,255,0),3)
    font  = cv2.FONT_HERSHEY_SIMPLEX
    cv2.putText(img, 'IoU is ' + str(iou_ret), (341,400), font, 1,(255,255,255),1)
    cv2.imshow('iou_rectangle', img)
    cv2.waitKey(0)

IoU代码输出结果如下所示:

IoU计算代码输出结果图

非极大值抑制算法NMS

NMS介绍

在目标检测中,常会利用非极大值抑制算法(NMS,non maximum suppression)对生成的大量候选框进行后处理,去除冗余的候选框,得到最佳检测框,以加快目标检测的效率。其本质思想是其思想是搜素局部最大值,抑制非极大值。非极大值抑制,在计算机视觉任务中得到了广泛的应用,例如边缘检测、人脸检测、目标检测(DPM,YOLO,SSD,Faster R-CNN)等。即如下图所示实现效果,消除多余的候选框,找到最佳的 bboxNMS过程如下图所示:

NMS过程

以上图为例,每个选出来的 Bounding Box 检测框(既BBox)用(x,y,h,w, confidence score,Pdog,Pcat)表示,confidence score 表示 background 和 foreground 的置信度得分,取值范围[0,1]。Pdog, Pcat分布代表类别是狗和猫的概率。如果是 100 类的目标检测模型,BBox 输出向量为 5+100=105

NMS算法

NMS主要就是通过迭代的形式,不断地以最大得分的框去与其他框做IoU操作,并过滤那些IoU较大的框。  其实现的思想主要是将各个框的置信度进行排序,然后选择其中置信度最高的框A,将其作为标准选择其他框,同时设置一个阈值,当其他框B与A的重合程度超过阈值就将B舍弃掉,然后在剩余的框中选择置信度最大的框,重复上述操作。**算法实现过程如下**:

  1. 根据候选框类别分类概率排序:F>E>D>C>B>A,并标记最大概率的矩形框F作为标准框。
  2. 分别判断A~E与F的重叠度IOU(两框的交并比)是否大于某个设定的阈值,假设B、D与F的重叠度超过阈值,那么就扔掉B、D;
  3. 从剩下的矩形框A、C、E中,选择概率最大的E,标记为要保留下来的,然后判读E与A、C的重叠度,扔掉重叠度超过设定阈值的矩形框;
  4. 对剩下的bbx,循环执行(2)和(3)直到所有的bbx均满足要求(即不能再移除bbx)

NMS的Python代码如下:

代码语言:javascript
复制
import numpy as np

def py_nms(dets, thresh):
    """Pure Python NMS baseline.注意,这里的计算都是在矩阵层面上计算的
    greedily select boxes with high confidence and overlap with current maximum <= thresh
    rule out overlap >= thresh
    :param dets: [[x1, y1, x2, y2 score],] # ndarray, shape(-1,5)
    :param thresh: retain overlap < thresh
    :return: indexes to keep
    """
    # x1、y1、x2、y2、以及score赋值
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]

    # 计算每一个候选框的面积, 纯矩阵加和乘法运算,为何加1?
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    # order是将confidence降序排序后得到的矩阵索引
    order = np.argsort(dets[:, 4])[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        # 计算当前概率最大矩形框与其他矩形框的相交框的坐标,会用到numpy的broadcast机制,得到的是向量
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        # 计算相交框的面积,注意矩形框不相交时w或h算出来会是负数,用0代替
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        # 计算重叠度IOU:重叠面积/(面积1+面积2-重叠面积)
        iou = inter / (areas[i] + areas[order[1:]] - inter)
        # 找到重叠度不高于阈值的矩形框索引
        inds = np.where(iou < thresh)[0]
        # 将order序列更新,由于前面得到的矩形框索引要比矩形框在原order序列中的索引小1,所以要把这个1加回来
        order = order[inds + 1]
    return keep

# test
if __name__ == "__main__":
    dets = np.array([[30, 20, 230, 200, 1],
                     [50, 50, 260, 220, 0.9],
                     [210, 30, 420, 5, 0.8],
                     [430, 280, 460, 360, 0.7]])
    thresh = 0.35
    keep_dets = py_nms(dets, thresh)
    print(keep_dets)
    print(dets[keep_dets])

程序输出如下:

[0, 2, 3]  [[ 30. 20. 230. 200. 1. ]  [210. 30. 420. 5. 0.8]  [430. 280. 460. 360. 0.7]]

另一个版本的 nms 的 python 代码如下:

代码语言:javascript
复制
from __future__ import print_function
import numpy as np
import time

def intersect(box_a, box_b):
    max_xy = np.minimum(box_a[:, 2:], box_b[2:])
    min_xy = np.maximum(box_a[:, :2], box_b[:2])
    inter = np.clip((max_xy - min_xy), a_min=0, a_max=np.inf)
    return inter[:, 0] * inter[:, 1]

def get_iou(box_a, box_b):
    """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
    is simply the intersection over union of two boxes.
    E.g.:
        A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
        The box should be [x1,y1,x2,y2]
    Args:
        box_a: Single numpy bounding box, Shape: [4] or Multiple bounding boxes, Shape: [num_boxes,4]
        box_b: Single numpy bounding box, Shape: [4]
    Return:
        jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]]
    """
    if box_a.ndim==1:
        box_a=box_a.reshape([1,-1])
    inter = intersect(box_a, box_b)
    area_a = ((box_a[:, 2]-box_a[:, 0]) *
              (box_a[:, 3]-box_a[:, 1]))  # [A,B]
    area_b = ((box_b[2]-box_b[0]) *
              (box_b[3]-box_b[1]))  # [A,B]
    union = area_a + area_b - inter
    return inter / union  # [A,B]

def nms(bboxs,scores,thresh):
    """
    The box should be [x1,y1,x2,y2]
    :param bboxs: multiple bounding boxes, Shape: [num_boxes,4]
    :param scores: The score for the corresponding box
    :return: keep inds
    """
    if len(bboxs)==0:
        return []
    order=scores.argsort()[::-1]
    keep=[]
    while order.size>0:
        i=order[0]
        keep.append(i)
        ious=get_iou(bboxs[order],bboxs[i])
        order=order[ious<=thresh]
    return keep

Soft NMS算法

Soft NMS算法是对NMS算法的改进,是发表在ICCV2017的文章中提出的。NMS算法存在一个问题是可能会把一些目标框给过滤掉,从而导致目标的recall指标比较低。原来的NMS可以描述如下:将IOU大于阈值的窗口的得分全部置为0。

硬NMS

文章的改进有两种形式,一种是线性加权的: 

线性加权形式的soft NMS

另一种是高斯加权的: 

高斯加权形式的soft NMS

注意,这两种形式,思想都是 \(M\) 为当前得分最高框,\(b_i\) 为待处理框, \(b_i\)和 \(M\)的 IOU 越大,bbox 的得分 \(s_i\) 就下降的越厉害( \(N_t\) 为给定阈值)。更多细节可以参考原论文soft nms的python代码如下

代码语言:javascript
复制
def soft_nms(dets, thresh, type='gaussian'):
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]
    scores = scores[order]

    keep = []
    while order.size > 0:
        i = order[0]
        dets[i, 4] = scores[0]
        keep.append(i)

        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        # 计算重叠度IOU:重叠面积/(面积1+面积2-重叠面积)
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        order = order[1:]
        scores = scores[1:]
        if type == 'linear':
            inds = np.where(ovr >= thresh)[0]
            scores[inds] *= (1 - ovr[inds])
        else:
            scores *= np.exp(- ovr ** 2 / thresh)
        inds = np.where(scores > 1e-3)[0]
        order = order[inds]
        scores = scores[inds]

        tmp = scores.argsort()[::-1]
        order = order[tmp]
        scores = scores[tmp]
    return keep

AP计算

目标检测领域常用的评估标准是:mAP(mean average precision),计算mAP需要先计算AP,计算AP需涉及到precisionrecall的计算,而这两者的计算又需设计TPFPFN的计算。 

AP的计算一般先设计到P-R曲线(precision-recall curve)的绘制,P-R曲线下面与x轴围成的面积称为average precison(AP)。下图是一个二分类问题的P-R曲线: 

分类的PR曲线图

近似计算AP(approximated average precision)

AP = \sum_{k=1}^{N}P(k)\Delta r(k)
  • 这个计算方式称为 approximated 形式的,而插值计算的方式里面这个是最精确的,每个样本点都参与了计算
  • 很显然位于一条竖直线上的点对计算AP没有贡献
  • 这里N为数据总量,k为每个样本点的索引, \(Δr(k)=r(k)−r(k−1)\)

近似计算AP和绘制PR曲线代码如下:

代码语言:javascript
复制
import numpy as np
import matplotlib.pyplot as plt

class_names = ["car", "pedestrians", "bicycle"]

def draw_PR_curve(predict_scores, eval_labels, name, cls_idx=1):
    """calculate AP and draw PR curve, there are 3 types
    Parameters:
    @all_scores: single test dataset predict scores array, (-1, 3)
    @all_labels: single test dataset predict label array, (-1, 3)
    @cls_idx: the serial number of the AP to be calculated, example: 0,1,2,3...
    """
    # print('sklearn Macro-F1-Score:', f1_score(predict_scores, eval_labels, average='macro'))
    global class_names
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 10))
    # Rank the predicted scores from large to small, extract their corresponding index(index number), and generate an array
    idx = predict_scores[:, cls_idx].argsort()[::-1]
    eval_labels_descend = eval_labels[idx]
    pos_gt_num = np.sum(eval_labels == cls_idx) # number of all gt

    predict_results = np.ones_like(eval_labels)
    tp_arr = np.logical_and(predict_results == cls_idx, eval_labels_descend == cls_idx) # ndarray
    fp_arr = np.logical_and(predict_results == cls_idx, eval_labels_descend != cls_idx)

    tp_cum = np.cumsum(tp_arr).astype(float) # ndarray, Cumulative sum of array elements.
    fp_cum = np.cumsum(fp_arr).astype(float)

    precision_arr = tp_cum / (tp_cum + fp_cum) # ndarray
    recall_arr = tp_cum / pos_gt_num
    ap = 0.0
    prev_recall = 0
    for p, r in zip(precision_arr, recall_arr):
      ap += p * (r - prev_recall)
      # pdb.set_trace()
      prev_recall = r
    print("------%s, ap: %f-----" % (name, ap))

    fig_label = '[%s, %s] ap=%f' % (name, class_names[cls_idx], ap)
    ax.plot(recall_arr, precision_arr, label=fig_label)

    ax.legend(loc="lower left")
    ax.set_title("PR curve about class: %s" % (class_names[cls_idx]))
    ax.set(xticks=np.arange(0., 1, 0.05), yticks=np.arange(0., 1, 0.05))
    ax.set(xlabel="recall", ylabel="precision", xlim=[0, 1], ylim=[0, 1])

    fig.savefig("./pr-curve-%s.png" % class_names[cls_idx])
    plt.close(fig)

插值计算(Interpolated average precision)

插值计算AP的公式的演变过程我不写了,详情可以参考这篇文章,我这里的公式和图也是参考此文章的。公式如下: 

  • 这是通常意义上的 11points_Interpolated 形式的 AP,选取固定的 {0,0.1,0.2,…,1.0} 11个阈值,这个在PASCAL2007中有使用;
  • 因为参与计算的只有11个点,所以 K=11,称为11points_Interpolated,k为阈值索引;
  • \(Pinterp(k)\) 取第 k 个阈值所对应的样本点之后的样本中的最大值,只不过这里的阈值被限定在了 \({0,0.1,0.2,…,1.0}\) 范围内。

插值计算方式计算AP

从曲线上看,真实 AP< approximated AP < Interpolated AP,11-points Interpolated AP 可能大也可能小,当数据量很多的时候会接近于 Interpolated AP,与 Interpolated AP不同,前面的公式中计算AP时都是对PR曲线的面积估计,PASCAL的论文里给出的公式就更加简单粗暴了,直接计算11个阈值处的precision的平均值。如下图:

11点方式计算AP公式

AP计算代码实现

不同数据集map计算方法

由于 mAP 是数据集中所有类别 AP 值得平均,所以我们要计算mAP,首先得知道某一类别的AP值怎么求。不同数据集的某类别的AP计算方法大同小异,主要分为三种:

(1)在 VOC2007,只需要选取当Recall >= 0, 0.1, 0.2, …, 1共11个点时的Precision最大值,然后AP就是这11个Precision的平均值,map就是所有类别AP值的平均。 (2)在 VOC2010 及以后,需要针对每一个不同的 Recall 值(包括0和1),选取其大于等于这些 Recall 值时的 Precision 最大值,然后计算PR曲线下面积作为AP值,map就是所有类别AP值的平均。 (3)COCO 数据集,设定多个IOU阈值(0.5-0.95,0.05为步长),在每一个IOU阈值下都有某一类别的AP值,然后求不同IOU阈值下的AP平均,就是所求的最终的某类别的AP值。

VOC 数据集中计算 AP 的代码(用的是插值计算方法,代码出自py-faster-rcnn仓库)如下:

1, 在给定 recal 和 precision 的条件下计算AP:

代码语言:javascript
复制
def voc_ap(rec, prec, use_07_metric=False):
    """ 
    ap = voc_ap(rec, prec, [use_07_metric])
    Compute VOC AP given precision and recall.
    If use_07_metric is true, uses the
    VOC 07 11 point method (default:False).
    """
    if use_07_metric:
        # 11 point metric
        ap = 0.
        for t in np.arange(0., 1.1, 0.1):
            if np.sum(rec >= t) == 0:
                p = 0
            else:
                p = np.max(prec[rec >= t])
            ap = ap + p / 11.
    else:
        # correct AP calculation
        # first append sentinel values at the end
        mrec = np.concatenate(([0.], rec, [1.]))
        mpre = np.concatenate(([0.], prec, [0.]))

        # compute the precision envelope
        for i in range(mpre.size - 1, 0, -1):
            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

        # to calculate area under PR curve, look for points
        # where X axis (recall) changes value
        i = np.where(mrec[1:] != mrec[:-1])[0]

        # and sum (\Delta recall) * prec
        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap

给定目标检测结果文件和测试集标签文件 xml 等计算AP

代码语言:javascript
复制
def parse_rec(filename):
    """ Parse a PASCAL VOC xml file 
    Return : list, element is dict.
    """
    tree = ET.parse(filename)
    objects = []
    for obj in tree.findall('object'):
        obj_struct = {}
        obj_struct['name'] = obj.find('name').text
        obj_struct['pose'] = obj.find('pose').text
        obj_struct['truncated'] = int(obj.find('truncated').text)
        obj_struct['difficult'] = int(obj.find('difficult').text)
        bbox = obj.find('bndbox')
        obj_struct['bbox'] = [int(bbox.find('xmin').text),
                              int(bbox.find('ymin').text),
                              int(bbox.find('xmax').text),
                              int(bbox.find('ymax').text)]
        objects.append(obj_struct)

    return objects


def voc_eval(detpath,
             annopath,
             imagesetfile,
             classname,
             cachedir,
             ovthresh=0.5,
             use_07_metric=False):
    """rec, prec, ap = voc_eval(detpath,
                                annopath,
                                imagesetfile,
                                classname,
                                [ovthresh],
                                [use_07_metric])
    Top level function that does the PASCAL VOC evaluation.
    detpath: Path to detections result file
        detpath.format(classname) should produce the detection results file.
    annopath: Path to annotations file
        annopath.format(imagename) should be the xml annotations file.
    imagesetfile: Text file containing the list of images, one image per line.
    classname: Category name (duh)
    cachedir: Directory for caching the annotations
    [ovthresh]: Overlap threshold (default = 0.5)
    [use_07_metric]: Whether to use VOC07's 11 point AP computation
        (default False)
    """
    # assumes detections are in detpath.format(classname)
    # assumes annotations are in annopath.format(imagename)
    # assumes imagesetfile is a text file with each line an image name
    # cachedir caches the annotations in a pickle file

    # first load gt
    if not os.path.isdir(cachedir):
        os.mkdir(cachedir)
    cachefile = os.path.join(cachedir, '%s_annots.pkl' % imagesetfile)
    # read list of images
    with open(imagesetfile, 'r') as f:
        lines = f.readlines()
    imagenames = [x.strip() for x in lines]

    if not os.path.isfile(cachefile):
        # load annotations
        recs = {}
        for i, imagename in enumerate(imagenames):
            recs[imagename] = parse_rec(annopath.format(imagename))
            if i % 100 == 0:
                print('Reading annotation for {:d}/{:d}'.format(
                    i + 1, len(imagenames)))
        # save
        print('Saving cached annotations to {:s}'.format(cachefile))
        with open(cachefile, 'wb') as f:
            pickle.dump(recs, f)
    else:
        # load
        with open(cachefile, 'rb') as f:
            try:
                recs = pickle.load(f)
            except:
                recs = pickle.load(f, encoding='bytes')

    # extract gt objects for this class
    class_recs = {}
    npos = 0
    for imagename in imagenames:
        R = [obj for obj in recs[imagename] if obj['name'] == classname]
        bbox = np.array([x['bbox'] for x in R])
        difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
        det = [False] * len(R)
        npos = npos + sum(~difficult)
        class_recs[imagename] = {'bbox': bbox,
                                 'difficult': difficult,
                                 'det': det}

    # read dets
    detfile = detpath.format(classname)
    with open(detfile, 'r') as f:
        lines = f.readlines()

    splitlines = [x.strip().split(' ') for x in lines]
    image_ids = [x[0] for x in splitlines]
    confidence = np.array([float(x[1]) for x in splitlines])
    BB = np.array([[float(z) for z in x[2:]] for x in splitlines])

    nd = len(image_ids)
    tp = np.zeros(nd)
    fp = np.zeros(nd)

    if BB.shape[0] > 0:
        # sort by confidence
        sorted_ind = np.argsort(-confidence)
        sorted_scores = np.sort(-confidence)
        BB = BB[sorted_ind, :]
        image_ids = [image_ids[x] for x in sorted_ind]

        # go down dets and mark TPs and FPs
        for d in range(nd):
            R = class_recs[image_ids[d]]
            bb = BB[d, :].astype(float)
            ovmax = -np.inf
            BBGT = R['bbox'].astype(float)

            if BBGT.size > 0:
                # compute overlaps
                # intersection
                ixmin = np.maximum(BBGT[:, 0], bb[0])
                iymin = np.maximum(BBGT[:, 1], bb[1])
                ixmax = np.minimum(BBGT[:, 2], bb[2])
                iymax = np.minimum(BBGT[:, 3], bb[3])
                iw = np.maximum(ixmax - ixmin + 1., 0.)
                ih = np.maximum(iymax - iymin + 1., 0.)
                inters = iw * ih

                # union
                uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
                       (BBGT[:, 2] - BBGT[:, 0] + 1.) *
                       (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)

                overlaps = inters / uni
                ovmax = np.max(overlaps)
                jmax = np.argmax(overlaps)

            if ovmax > ovthresh:
                if not R['difficult'][jmax]:
                    if not R['det'][jmax]:
                        tp[d] = 1.
                        R['det'][jmax] = 1
                    else:
                        fp[d] = 1.
            else:
                fp[d] = 1.

    # compute precision recall
    fp = np.cumsum(fp)
    tp = np.cumsum(tp)
    rec = tp / float(npos)
    # avoid divide by zero in case the first detection matches a difficult
    # ground truth
    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
    ap = voc_ap(rec, prec, use_07_metric)

    return rec, prec, ap

FLOPs计算

  • FLOPs:floating point operations 指的是浮点运算次数,理解为计算量,可以用来衡量算法/模型的复杂度。
  • FLOPS:(全部大写),Floating-point Operations Per Second,每秒所执行的浮点运算次数,理解为计算速度,是一个衡量硬件性能的指标。
  • MACC:multiply-accumulate,乘法累加次数。

卷积层FLOPs计算

\(FLOPs=(2\times C_i\times K^2-1)\times H\times W\times C_o\)(不考虑bias) \(FLOPs=(2\times C_i\times K^2)\times H\times W\times C_o\)(考虑bias \(MACC=(C_i\times K^2)\times H\times W\times C_o\)(考虑bias)

\(C_i\) 为输入特征图通道数,\(K\) 为过滤器尺寸,\(H,W,C_o\)为输出特征图的高,宽和通道数二维卷积过程如下图所示:

二维卷积是一个相当简单的操作:从卷积核开始,这是一个小的权值矩阵。这个卷积核在 2 维输入数据上滑动,对当前输入的部分元素进行矩阵乘法,然后将结果汇为单个输出像素。

公式解释,参考这里,如下:

理解 FLOPs 的计算公式分两步,括号内是第一步,计算出 output feature map 的一个 pixel,然后再乘以 \(H\times W\times C_o\)​,从而拓展到整个 output feature map。括号内的部分又可以分为两步:\((2\times C_i\times K^2-1)=(C_i\times K^2) + (C_i\times K^2-1)\)。第一项是乘法运算次数,第二项是加法运算次数,因为 \(n\) 个数相加,要加 \(n-1\)次,所以不考虑 bias 的情况下,会有一个-1,如果考虑 bias,刚好中和掉,括号内变为\( (2\times C_i\times K^2)\)。

所以卷积层的 \(FLOPs=(2\times C_{i}\times K^2-1)\times H\times W\times C_o\) ( \(C_i\) 为输入特征图通道数,\(K\) 为过滤器尺寸,\(H, W, C_o\)为输出特征图的高,宽和通道数)。

全连接层的 FLOPs 计算

全连接层的 \(FLOPs = (2I − 1)O\),\(I\) 是输入层的维度,\(O\) 是输出层的维度。

参考资料

PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE

目标检测度量标准汇总

参考资料

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2020-11-13,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 前言
  • anchors
  • 交并比IOU
  • 非极大值抑制算法NMS
    • NMS介绍
      • NMS算法
      • Soft NMS算法
      • AP计算
        • 近似计算AP(approximated average precision)
          • 插值计算(Interpolated average precision)
            • AP计算代码实现
              • 不同数据集map计算方法
              • FLOPs计算
                • 卷积层FLOPs计算
                  • 全连接层的 FLOPs 计算
                    • 参考资料
                    • 目标检测度量标准汇总
                    • 参考资料
                    相关产品与服务
                    图像识别
                    腾讯云图像识别基于深度学习等人工智能技术,提供车辆,物体及场景等检测和识别服务, 已上线产品子功能包含车辆识别,商品识别,宠物识别,文件封识别等,更多功能接口敬请期待。
                    领券
                    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档