文章/答案/技术大牛

发布

社区首页 >专栏 >使用TensorFlow物体检测模型、Python和OpenCV的社交距离检测器

使用TensorFlow物体检测模型、Python和OpenCV的社交距离检测器

小白学视觉

发布于 2020-07-22 07:35:00

1.4K00

代码可运行

文章被收录于专栏：深度学习和计算机视觉深度学习和计算机视觉

运行总次数：0

代码可运行

0.介绍

疫情期间，我们在GitHub上搜索TensorFlow预训练模型，发现了一个包含25个物体检测预训练模型的库，并且这些预训练模型中包含其性能和速度指标。结合一定的计算机视觉知识，使用其中的模型来构建社交距离程序会很有趣。

学习OpenCV的过程中，小伙伴们应该知道对于一些小型项目OpenCV具有很强大的功能，其中一个就是对图片进行鸟瞰转换，鸟瞰图是对一个场景自上而下的表示，也是构建自动驾驶应用程序时经常需要执行的任务。

车载摄像头鸟瞰系统的实现

这说明将鸟瞰转换的技术应用到监视社交距离的场景中可以提高监视质量。

本期我们将介绍了如何使用深度学习模型以及计算机视觉方面的一些知识来构建强大的社交距离检测器。

本文的结构如下：

·模型选择

·人员检测

·鸟瞰图转换

·社交距离测量

·结果与改进

所有代码及安装说明可以以下链接中找到：https://github.com/basileroth75/covid-social-distancing-detection

1.模型选择

在TensorFlow物体检测模型zoo中的所有可用模型已经在COCO数据集（Context中的通用物体）上进行了预训练。COCO数据集包含120000张图像，这些图像中总共包含880000个带标签的物体。这些模型经过预训练可以检测90种不同类型的物体，物体类型的完整列表可以在GitHub的data部分得到，地址为：https://github.com/tensorflow/models/blob/master/research/object_detection/data/mscoco_complete_label_map.pbtxt

可用模型的非详尽清单

模型的预测速度不同，性能表现也不同。为了决定如何根据模型的预测速度来利用模型，我进行了一些测试。由于社交距离检测器的目标不是执行实时分析，因此最终选择了fast_rcnn_inception_v2_coco ，它的mAP（验证集上检测器的性能）为28，执行速度为58ms，非常强大，下载地址为：

http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz

2.人员检测

使用上述模型检测人员，必须完成一些步骤：

·将包含模型的文件加载到TensorFlow图中，并定义我们想从模型获得的输出。

·对于每一帧，将图像输入到TensorFlow图以获取所需的输出。

·过滤掉弱预测和不需要检测的物体。

加载并启动模型：

TensorFlow模型的工作方式是使用graphs(图)。第一步意味着将模型加载到TensorFlow图中，该图将包含所需检测。下一步是创建一个session（会话），该会话是负责执行定义在图中操作的一个实体。有关图和会话的更多说明，参见https://danijar.com/what-is-a-tensorflow-session/ 。在这里我们实现了一个类，将与TensorFlow图有关的所有数据关联在一起。

class Model:
"""
    Class that contains the model and all its functions
    """
def __init__(self, model_path):
"""
        Initialization function
        @ model_path : path to the model 
        """

# Declare detection graph
        self.detection_graph = tf.Graph()
# Load the model into the tensorflow graph
with self.detection_graph.as_default():
            od_graph_def = tf.compat.v1.GraphDef()
with tf.io.gfile.GFile(model_path, 'rb') as file:
                serialized_graph = file.read()
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')

# Create a session from the detection graph
        self.sess = tf.compat.v1.Session(graph=self.detection_graph)

def predict(self,img):
"""
        Get the predicition results on 1 frame
        @ img : our img vector
        """
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
        img_exp = np.expand_dims(img, axis=0)
# Pass the inputs and outputs to the session to get the results 
        (boxes, scores, classes) = self.sess.run([self.detection_graph.get_tensor_by_name('detection_boxes:0'), self.detection_graph.get_tensor_by_name('detection_scores:0'), self.detection_graph.get_tensor_by_name('detection_classes:0')],feed_dict={self.detection_graph.get_tensor_by_name('image_tensor:0'): img_exp})
return (boxes, scores, classes)

通过模型传递每一帧

对于需要处理的每个帧，都会启动一个新会话，这是通过调用run（）函数完成的。这样做时必须指定一些参数，这些参数包括模型所需的输入类型以及我们要从中获取的输出。在我们的案例中所需的输出如下：

·每个物体的边界框坐标

·每个预测的置信度（0到1）

·预测类别（0到90）

·过滤弱预测和不相关物体

人员检测结果

模型能检测到的很多物体类别，其中之一是人并且与其关联的类为1。为了排除弱预测（阈值：0.75）和除人以外的所有其他类别的物体，我使用了if语句，将这两个条件结合起来以排除任何其他物体，以免进一步计算。

if int(classes[i]) == 1 and scores[i] > 0.75

但是因为这些模型已经经过预训练，不可能仅检测此类（人）。因此，这些模型要花很长时间才能运行，因为它们试图识别场景中所有90种不同类型的物体。

3.鸟瞰图转换

如引言中所述，执行鸟瞰图转换可为我们提供场景的俯视图。值得庆幸的是OpenCV具有强大的内置函数，此函数可以将从透视图角度拍摄的图像转换为俯视图。我使用了Adrian Rosebrock的教程来了解如何做到这一点：https://www.pyimagesearch.com/2014/08/25/4-point-opencv-getperspective-transform-example/

第一步选择原始图像上的4个点，这些点将成为要转换的图的角点。这些点必须形成一个矩形，至少两个相对的边平行，如果不这样做，则转换发生时的比例将不同。我已经在我的仓库中实现了一个脚本，该脚本使用OpenCV的setMouseCallback（）函数来获取这些坐标。计算变换矩阵的函数还需要使用图像的image.shape属性计算图像尺寸。

width, height, _ = image.shape

这将返回宽度、高度和其他不相关的颜色像素值。让我们看看如何使用它们计算变换矩阵：

def compute_perspective_transform(corner_points,width,height,image):
""" Compute the transformation matrix
  @ corner_points : 4 corner points selected from the image
  @ height, width : size of the image
  return : transformation matrix and the transformed image
  """
# Create an array out of the 4 corner points
  corner_points_array = np.float32(corner_points)
# Create an array with the parameters (the dimensions) required to build the matrix
  img_params = np.float32([[0,0],[width,0],[0,height],[width,height]])
# Compute and return the transformation matrix
  matrix = cv2.getPerspectiveTransform(corner_points_array,img_params) 
  img_transformed = cv2.warpPerspective(image,matrix,(width,height))
return matrix,img_transformed

注意函数的返回值是矩阵，因为在下一步中将使用这个矩阵计算每个被检测到的人的新坐标，新坐标是帧中每个人的“ GPS”坐标，使用这些新坐标而不是使用原始基点结果更为准确，因为在透视图中当人们处于不同平面时，距离是不一样的，并且距相机的距离也不相同。与使用原始检测框中的点相比，这可以大大改善社会距离的测量。

对于检测到的每个人，将返回构建边界框所需的2个点，这两个点是边界框的左上角和右下角。通过获取两点之间的中点来计算边界框的质心，使用此结果，计算位于边界框底部中心的点的坐标，我认为这一点（称为“基点”）是图像中人坐标的最佳表示。

然后使用变换矩阵为每个检测到的基点计算变换后的坐标。在检测到人之后，在每一帧上使用cv2.perspectiveTransform（）完成此操作。实现此任务的方式：

def compute_point_perspective_transformation(matrix,list_downoids):
""" Apply the perspective transformation to every ground point which have been detected on the main frame.
  @ matrix : the 3x3 matrix 
  @ list_downoids : list that contains the points to transform
  return : list containing all the new points
  """
# Compute the new coordinates of our points
  list_points_to_detect = np.float32(list_downoids).reshape(-1, 1, 2)
  transformed_points = cv2.perspectiveTransform(list_points_to_detect, matrix)
# Loop over the points and add them to the list that will be returned
  transformed_points_list = list()
for i in range(0,transformed_points.shape[0]):
    transformed_points_list.append([transformed_points[i][0][0],transformed_points[i][0][1]])
return transformed_points_list

4.社交距离测量

在每帧上调用此函数后，将返回一个包含所有新转换点的列表，从这个列表中，计算每对点之间的距离。这里我使用了itertools库的Combination（）函数，该函数允许在列表中获取所有可能的组合而无需保留双精度。在https://stackoverflow.com/questions/104420/how-to-generate-all-permutations-of-a-list 堆栈溢出问题中对此进行了很好的解释。其余的是简单的数学运算：使用math.sqrt（）函数计算两点之间的距离。选择的阈值为120像素，因为它在我们的场景中大约等于2英尺。

# Check if 2 or more people have been detected (otherwise no need to detect)
if len(transformed_downoids) >= 2:
    # Iterate over every possible 2 by 2 between the points combinations 
list_indexes = list(itertools.combinations(range(len(transformed_downoids)), 2))
for i,pair in enumerate(itertools.combinations(transformed_downoids, r=2)):
      # Check if the distance between each combination of points is less than the minimum distance chosen
if math.sqrt( (pair[0][0] - pair[1][0])**2 + (pair[0][1] - pair[1][1])**2 ) < int(distance_minimum):
        # Change the colors of the points that are too close from each other to red
change_color_topview(pair)
        # Get the equivalent indexes of these points in the original frame and change the color to red
index_pt1 = list_indexes[i][0]
index_pt2 = list_indexes[i][1]
change_color_originalframe(index_pt1,index_pt2)

一旦确定两个点之间的距离太近，标记该点的圆圈的颜色将从绿色更改为红色，原始框架上的边界框的颜色也做相同的颜色变换操作。

5.结果

回顾项目的工作原理：

·首先获取图的4个角点，然后应用透视变换获得该图的鸟瞰图并保存透视变换矩阵。

·获取原始帧中检测到的每个人的边界框。

·计算这些框的最低点，最低点是位于人双脚之间的点。

·对这些点应用变换矩阵，获取每一个人的真实“ GPS”坐标。

·使用itertools.combinations（）测量帧中每个点到所有其它点的距离。

·如果检测到违反社交距离，将边框的颜色更改为红色。

我使用来自PETS2009 数据集http://www.cvg.reading.ac.uk/PETS2009/a.html#s0 的视频，该视频由包含不同人群活动的多传感器序列组成，它最初是为诸如人群中人员计数和密度估计之类的任务而构建的。我决定从第一个角度使用视频，因为它是最宽的一个，具有最佳的场景视角。该视频介绍了获得的结果：

https://youtu.be/3b2GPwN2_I0

6.结论与改进

如今，隔离以及其他基本卫生措施对抑制Covid-19的传播速度非常重要。但该项目仅是概念的证明，并且由于道德和隐私问题，不能用于监视公共或私人区域的社交距离。

这个项目存在一些小的缺陷，改进思路如下：

·使用更快的模型来执行实时社交距离分析。

·使用对遮挡更具鲁棒性的模型。

·自动校准是计算机视觉中一个众所周知的问题，可以在不同场景上极大地改善鸟瞰图的转换。

7.参考资料

https://towardsdatascience.com/analyse-a-soccer-game-using-tensorflow-object-detection-and-opencv-e321c230e8f2

https://www.pyimagesearch.com/2014/08/25/4-point-opencv-getperspective-transform-example/

https://developer.ridgerun.com/wiki/index.php?title=Birds_Eye_View/Introduction/Research

http://www.cvg.reading.ac.uk/PETS2009/a.html#s0