前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >【连载21】Selective Search-3.11

【连载21】Selective Search-3.11

作者头像
lujohn3li
修改2020-03-13 14:09:12
4570
修改2020-03-13 14:09:12
举报

目标检测

目标检测的发展历程大致如下:

Selective Search

对于目标识别任务,比如判断一张图片中有没有车、是什么车,一般需要解决两个问题:目标检测、目标识别。而目标检测任务中通常需要先通过某种方法做图像分割,事先得到候选框;直观的做法是:给定窗口,对整张图片滑动扫描,结束后改变窗口大小重复上面步骤,缺点很明显:重复劳动耗费资源、精度和质量不高等等。 针对上面的问题,一种解决方案是借鉴启发式搜索的方法,充分利用人类的先验知识。J.R.R. Uijlings在《Selective Search for Object Recoginition》提出一种方法:基于数据驱动,与具体类别无关的多种策略融合的启发式生成方法。图片包含各种丰富信息,例如:大小、形状、颜色、纹理、物体重叠关系等,如果只使用一种信息往往不能解决大部分问题,例如:

左边的两只猫可以通过颜色区别而不是通过纹理,右面的变色龙却只能通过纹理区别而不是颜色。

启发式生成设计准则

所以概括来说:

  • 能够捕捉到各种尺度物体,大的、小的、边界清楚的、边界模糊的等等; 多尺度的例子:
  • 策略多样性,采用多样的策略集合共同作用;
  • 计算快速,由于生成候选框只是检测第一步,所以计算上它决不能成为瓶颈。

Selective Search

基于以上准则设计Selective Search算法:

  • 采用层次分组算法解决尺度问题。 引入图像分割中的自下而上分组思想,由于整个过程是层次的,在将整个图合并成一个大的区域的过程中会输出不同尺度的多个子区域。整个过程如下: 1、利用《Efficient Graph-Based Image Segmentation》(基本思想:将图像中每个像素表示为图上的一个节点,用于连接不同节点的无向边都有一个权重,这个权重表示两个节点之间的不相似度,通过贪心算法利用最小生成树做图像分割)生成初始候选区域; 2、采用贪心算法合并区域,计算任意两个领域的相似度,把达到阈值的合并,再计算新区域和其所有领域的相似度,循环迭代,直到整个图变成了一个区域,算法如下:
  • 多样化策略 三个方面:使用多种颜色空间、使用多种相似度计算方法、搜索起始区域不固定。 1、颜色空间有很多种:RGB、HSV、Lab等等,不是论文重点; 2、相似度衡量算法,结合了4重策略: ◆ 颜色相似度 以RGB为例,使用L1-norm归一化每个图像通道的色彩直方图(bins=25),每个区域被表示为25×3维向量:; 颜色相似度定义为: 区域合并后对新的区域计算其色彩直方图: 新区域的大小为: ◆ 纹理相似度 使用快速生成的类SIFT特征,对每个颜色通道在8个方向上应用方差为1的高斯滤波器,对每个颜色通道的每个方向提取bins=10的直方图,所以整个纹理向量维度为:3×8×10=240,表示为:; 纹理相似度定义为: ◆ 大小相似度 该策略希望小的区域能尽早合并,让合并操作比较平滑,防止出现某个大区域逐步吞并其他小区域的情况。相似度定义为: 其中size(im)为图像包含像素点数目。 ◆ 区域规则度相似度 能够框住合并后的两个区域的矩形大小越小说明两个区域的合并越规则,如: 区域规则度相似度定义为:

最终相似度为所有策略加权和,文中采用等权方式:

使用Selective Search做目标识别

训练过程包含:提取候选框、提取特征、生成正负样本、训练模型,图示如下:

早期图像特征提取往往是各种HOG特征或BoW特征,现在CNN特征几乎一统天下。 检测定位效果评价采用Average Best Overlap(ABO)和Mean Average Best Overlap(MABO):

其中:为类别标注、为类别下的ground truth,为通过Selective Search生成的候选框。

)

代码实践

参见AlpacaDB。

  • selectivesearch.py
# -*- coding: utf-8 -*-
import skimage.io
import skimage.feature
import skimage.color
import skimage.transform
import skimage.util
import skimage.segmentation
import numpy
# "Selective Search for Object Recognition" by J.R.R. Uijlings et al.
#
#  - Modified version with LBP extractor for texture vectorization
def _generate_segments(im_orig, scale, sigma, min_size):
    """
        segment smallest regions by the algorithm of Felzenswalb and
        Huttenlocher
    """
    # open the Image
    im_mask = skimage.segmentation.felzenszwalb(
        skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,
        min_size=min_size)
    # merge mask channel to the image as a 4th channel
    im_orig = numpy.append(
        im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
    im_orig[:, :, 3] = im_mask
    return im_orig
def _sim_colour(r1, r2):
    """
        calculate the sum of histogram intersection of colour
    """
    return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
def _sim_texture(r1, r2):
    """
        calculate the sum of histogram intersection of texture
    """
    return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
def _sim_size(r1, r2, imsize):
    """
        calculate the size similarity over the image
    """
    return 1.0 - (r1["size"] + r2["size"]) / imsize
def _sim_fill(r1, r2, imsize):
    """
        calculate the fill similarity over the image
    """
    bbsize = (
        (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
        * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
    )
    return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
def _calc_sim(r1, r2, imsize):
    return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
            + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))
def _calc_colour_hist(img):
    """
        calculate colour histogram for each region
        the size of output histogram will be BINS * COLOUR_CHANNELS(3)
        number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
        extract HSV
    """
    BINS = 25
    hist = numpy.array([])
    for colour_channel in (0, 1, 2):
        # extracting one colour channel
        c = img[:, colour_channel]
        # calculate histogram for each colour and join to the result
        hist = numpy.concatenate(
            [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])
    # L1 normalize
    hist = hist / len(img)
    return hist
def _calc_texture_gradient(img):
    """
        calculate texture gradient for entire image
        The original SelectiveSearch algorithm proposed Gaussian derivative
        for 8 orientations, but we use LBP instead.
        output will be [height(*)][width(*)]
    """
    ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))
    for colour_channel in (0, 1, 2):
        ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
            img[:, :, colour_channel], 8, 1.0)
    return ret
def _calc_texture_hist(img):
    """
        calculate texture histogram for each region
        calculate the histogram of gradient for each colours
        the size of output histogram will be
            BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
    """
    BINS = 10
    hist = numpy.array([])
    for colour_channel in (0, 1, 2):
        # mask by the colour channel
        fd = img[:, colour_channel]
        # calculate histogram for each orientation and concatenate them all
        # and join to the result
        hist = numpy.concatenate(
            [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])
    # L1 Normalize
    hist = hist / len(img)
    return hist
def _extract_regions(img):
    R = {}
    # get hsv image
    hsv = skimage.color.rgb2hsv(img[:, :, :3])
    # pass 1: count pixel positions
    for y, i in enumerate(img):
        for x, (r, g, b, l) in enumerate(i):
            # initialize a new region
            if l not in R:
                R[l] = {
                    "min_x": 0xffff, "min_y": 0xffff,
                    "max_x": 0, "max_y": 0, "labels": [l]}
            # bounding box
            if R[l]["min_x"] > x:
                R[l]["min_x"] = x
            if R[l]["min_y"] > y:
                R[l]["min_y"] = y
            if R[l]["max_x"] < x:
                R[l]["max_x"] = x
            if R[l]["max_y"] < y:
                R[l]["max_y"] = y
    # pass 2: calculate texture gradient
    tex_grad = _calc_texture_gradient(img)
    # pass 3: calculate colour histogram of each region
    for k, v in R.items():
        # colour histogram
        masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
        R[k]["size"] = len(masked_pixels / 4)
        R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
        # texture histogram
        R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])
    return R
def _extract_neighbours(regions):
    def intersect(a, b):
        if (a["min_x"] < b["min_x"] < a["max_x"]
                and a["min_y"] < b["min_y"] < a["max_y"]) or (
            a["min_x"] < b["max_x"] < a["max_x"]
                and a["min_y"] < b["max_y"] < a["max_y"]) or (
            a["min_x"] < b["min_x"] < a["max_x"]
                and a["min_y"] < b["max_y"] < a["max_y"]) or (
            a["min_x"] < b["max_x"] < a["max_x"]
                and a["min_y"] < b["min_y"] < a["max_y"]):
            return True
        return False
    R = regions.items()
    neighbours = []
    for cur, a in enumerate(R[:-1]):
        for b in R[cur + 1:]:
            if intersect(a[1], b[1]):
                neighbours.append((a, b))
    return neighbours
def _merge_regions(r1, r2):
    new_size = r1["size"] + r2["size"]
    rt = {
        "min_x": min(r1["min_x"], r2["min_x"]),
        "min_y": min(r1["min_y"], r2["min_y"]),
        "max_x": max(r1["max_x"], r2["max_x"]),
        "max_y": max(r1["max_y"], r2["max_y"]),
        "size": new_size,
        "hist_c": (
            r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
        "hist_t": (
            r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
        "labels": r1["labels"] + r2["labels"]
    }
    return rt
def selective_search(
        im_orig, scale=1.0, sigma=0.8, min_size=50):
    '''Selective Search
    Parameters
    ----------
        im_orig : ndarray
            Input image
        scale : int
            Free parameter. Higher means larger clusters in felzenszwalb segmentation.
        sigma : float
            Width of Gaussian kernel for felzenszwalb segmentation.
        min_size : int
            Minimum component size for felzenszwalb segmentation.
    Returns
    -------
        img : ndarray
            image with region label
            region label is stored in the 4th value of each pixel [r,g,b,(region)]
        regions : array of dict
            [
                {
                    'rect': (left, top, right, bottom),
                    'labels': [...]
                },
                ...
            ]
    '''
    assert im_orig.shape[2] == 3, "3ch image is expected"
    # load image and get smallest regions
    # region label is stored in the 4th value of each pixel [r,g,b,(region)]
    img = _generate_segments(im_orig, scale, sigma, min_size)
    if img is None:
        return None, {}
    imsize = img.shape[0] * img.shape[1]
    R = _extract_regions(img)
    # extract neighbouring information
    neighbours = _extract_neighbours(R)
    # calculate initial similarities
    S = {}
    for (ai, ar), (bi, br) in neighbours:
        S[(ai, bi)] = _calc_sim(ar, br, imsize)
    # hierarchal search
    while S != {}:
        # get highest similarity
        i, j = sorted(S.items(), cmp=lambda a, b: cmp(a[1], b[1]))[-1][0]
        # merge corresponding regions
        t = max(R.keys()) + 1.0
        R[t] = _merge_regions(R[i], R[j])
        # mark similarities for regions to be removed
        key_to_delete = []
        for k, v in S.items():
            if (i in k) or (j in k):
                key_to_delete.append(k)
        # remove old similarities of related regions
        for k in key_to_delete:
            del S[k]
        # calculate similarity set with the new region
        for k in filter(lambda a: a != (i, j), key_to_delete):
            n = k[1] if k[0] in (i, j) else k[0]
            S[(t, n)] = _calc_sim(R[t], R[n], imsize)
    regions = []
    for k, r in R.items():
        regions.append({
            'rect': (
                r['min_x'], r['min_y'],
                r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
            'size': r['size'],
            'labels': r['labels']
        })
    return img, regions		
  • example.py
# -*- coding: utf-8 -*-
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import skimage.data
import skimage.io
from skimage.io import use_plugin,imread
import matplotlib.patches as mpatches
from matplotlib.pyplot import savefig
import selectivesearch
def main():
    # loading astronaut image
    #img = skimage.data.astronaut()
    use_plugin('pil')
    img = imread('car.jpg', as_grey=False)
    # perform selective search
    img_lbl, regions = selectivesearch.selective_search(
        img, scale=500, sigma=0.9, min_size=10)
    candidates = set()
    for r in regions:
        # excluding same rectangle (with different segments)
        if r['rect'] in candidates:
            continue
        # excluding regions smaller than 2000 pixels
        if r['size'] < 2000:
            continue
        # distorted rects
        x, y, w, h = r['rect']
        if w / h > 1.2 or h / w > 1.2:
            continue
        candidates.add(r['rect'])
    # draw rectangles on the original image
    plt.figure()
    fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
    ax.imshow(img)
    for x, y, w, h in candidates:
        print x, y, w, h
        rect = mpatches.Rectangle(
            (x, y), w, h, fill=False, edgecolor='red', linewidth=1)
        ax.add_patch(rect)
    #plt.show()
    savefig('MyFig.jpg')
if __name__ == "__main__":
    main()		

car.jpg原图如下:

结果图如下:

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-03-11,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 机器学习爱好者社区 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 目标检测
    • Selective Search
      • 启发式生成设计准则
      • Selective Search
      • 使用Selective Search做目标识别
      • 代码实践
相关产品与服务
图像识别
腾讯云图像识别基于深度学习等人工智能技术,提供车辆,物体及场景等检测和识别服务, 已上线产品子功能包含车辆识别,商品识别,宠物识别,文件封识别等,更多功能接口敬请期待。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档