前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用CNN和PyTorch进行面部关键点检测

使用CNN和PyTorch进行面部关键点检测

作者头像
代码医生工作室
发布2019-12-06 13:51:09
4K0
发布2019-12-06 13:51:09
举报
文章被收录于专栏:相约机器人相约机器人
作者 | Krunal Kshirsagar

来源 | Medium

编辑 | 代码医生团队

什么是面部关键点?

面部关键点也称为面部地标,通常指定面部的鼻子,眼睛,嘴巴等区域,该面部按68个关键点分类,并带有该坐标的坐标(x,y)。使用面部关键点,可以实现面部识别,情绪识别等。

点代表关键点

选择数据集:

由于Udacity已经提供了YouTube Faces数据集,因此将使用它。它是一个数据集,包含3,425个面部视频,旨在研究视频中无约束的面部识别问题。这些视频已通过处理步骤输入,并转换为包含一个脸部和关联关键点的图像帧集。

https://www.cs.tau.ac.il/~wolf/ytfaces/

训练和测试数据:

该面部关键点数据集由5770幅彩色图像组成。所有这些图像都分为训练数据集或测试数据集。

这些图像中的3462是训练图像,供在创建模型以预测关键点时使用。

2308是测试图像,将用于测试模型的准确性。

预处理数据:

为了将数据(图像)输入到神经网络,必须通过将numpy数组转换为Pytorch张量来将图像转换为固定的尺寸大小和标准的颜色范围(以便进行更快的计算)。

转换:

  • Normalize:将彩色图像转换为[0,1]范围的灰度值,并将关键点规范化为大约[-1,1]的范围
  • Rescale:将图像重新缩放至所需尺寸。
  • RandomCrop:随机裁剪图像。
  • ToTensor:将numpy图片转换为torch图像。
代码语言:javascript
复制
# test out some of these transforms
rescale = Rescale(100)
crop = RandomCrop(50)
composed = transforms.Compose([Rescale(250),
                               RandomCrop(224)])

# apply the transforms to a sample image
test_num = 500
sample = face_dataset[test_num]
fig = plt.figure()
for i, tx in enumerate([rescale, crop, composed]):
    transformed_sample = tx(sample)
    ax = plt.subplot(1, 3, i + 1)
    plt.tight_layout()
    ax.set_title(type(tx).__name__)
    show_keypoints(transformed_sample['image'], transformed_sample['keypoints'])
plt.show()

转换输出

创建转换后的数据集:

代码语言:javascript
复制
# define the data tranform
# order matters! i.e. rescaling should come before a smaller crop
data_transform = transforms.Compose([Rescale(250),
                                     RandomCrop(224),
                                     Normalize(),
                                     ToTensor()])

# create the transformed dataset
transformed_dataset = FacialKeypointsDataset(csv_file='/data/training_frames_keypoints.csv',
                                             root_dir='/data/training/',
                                             transform=data_transform)

这里224 * 224px是通过变换获得的标准化输入图像大小,输出类别得分应为136,即136/2 = 68

定义CNN架构:

在查看了要使用的数据并了解了图像的形状和关键点之后,就可以定义一个可以从该数据中学习的卷积神经网络了。

定义此CNN的所有层,唯一的要求是:

  1. 该网络接收一个正方形(宽度和高度相同)的灰度图像作为输入。
  2. 它以代表关键点的线性层结尾(最后一层输出136个值,对于68个关键点(x,y)对中的每个输出2个值)。

卷积层的形状:

K — out_channels:卷积层中的过滤器数

F — kernel_size

S-卷积的步幅

P-填充

W —上一层的宽度/高度(正方形)

The self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

output size = (W-F)/S +1 = (224–5)/1 +1 = 220, the output Tensor for one image will have the dimensions: (1, 220, 220)

1 input image channel (grayscale), 32 output channels/feature maps, 5x5 square convolution kernel

代码语言:javascript
复制
self.conv1 = nn.Conv2d(1, 32, 5)
# output size = (W-F)/S +1 = (224-5)/1 + 1 = 220
self.pool1 = nn.MaxPool2d(2, 2)
# 220/2 = 110  the output Tensor for one image, will have the #dimensions: (32, 110, 110)
self.conv2 = nn.Conv2d(32,64,3)
# output size = (W-F)/S +1 = (110-3)/1 + 1 = 108
self.pool2 = nn.MaxPool2d(2, 2)
#108/2=54   the output Tensor for one image, will have the #dimensions: (64, 54, 54)
self.conv3 = nn.Conv2d(64,128,3)
# output size = (W-F)/S +1 = (54-3)/1 + 1 = 52
self.pool3 = nn.MaxPool2d(2, 2)
#52/2=26    the output Tensor for one image, will have the #dimensions: (128, 26, 26)
self.conv4 = nn.Conv2d(128,256,3)
# output size = (W-F)/S +1 = (26-3)/1 + 1 = 24
self.pool4 = nn.MaxPool2d(2, 2)
#24/2=12   the output Tensor for one image, will have the #dimensions: (256, 12, 12)
self.conv5 = nn.Conv2d(256,512,1)
# output size = (W-F)/S +1 = (12-1)/1 + 1 = 12
self.pool5 = nn.MaxPool2d(2, 2)
#12/2=6    the output Tensor for one image, will have the #dimensions: (512, 6, 6)
#Linear Layer
self.fc1 = nn.Linear(512*6*6, 1024)
self.fc2 = nn.Linear(1024, 136)
代码语言:javascript
复制
self.conv1 = nn.Conv2d(1, 32, 5)
# output size = (W-F)/S +1 = (224-5)/1 + 1 = 220
self.pool1 = nn.MaxPool2d(2, 2)
# 220/2 = 110  the output Tensor for one image, will have the dimensions: (32, 110, 110)
 
self.conv2 = nn.Conv2d(32,64,3)
# output size = (W-F)/S +1 = (110-3)/1 + 1 = 108
self.pool2 = nn.MaxPool2d(2, 2)
#108/2=54   the output Tensor for one image, will have the dimensions: (64, 54, 54)
 
self.conv3 = nn.Conv2d(64,128,3)
# output size = (W-F)/S +1 = (54-3)/1 + 1 = 52
self.pool3 = nn.MaxPool2d(2, 2)
#52/2=26    the output Tensor for one image, will have the dimensions: (128, 26, 26)
 
self.conv4 = nn.Conv2d(128,256,3)
# output size = (W-F)/S +1 = (26-3)/1 + 1 = 24
self.pool4 = nn.MaxPool2d(2, 2)
#24/2=12   the output Tensor for one image, will have the dimensions: (256, 12, 12)
 
self.conv5 = nn.Conv2d(256,512,1)
# output size = (W-F)/S +1 = (12-1)/1 + 1 = 12
self.pool5 = nn.MaxPool2d(2, 2)
#12/2=6    the output Tensor for one image, will have the dimensions: (512, 6, 6)
 
#Linear Layer
self.fc1 = nn.Linear(512*6*6, 1024)
self.fc2 = nn.Linear(1024, 136)

可以添加Dropouts来规范化深度神经网络。获得更好结果的秘诀之一是将辍学的概率(p)保持在0.1到0.5的范围内。同样,最好有多个丢弃概率(p)不同的值。

代码语言:javascript
复制
self.drop1 = nn.Dropout(p = 0.1)
self.drop2 = nn.Dropout(p = 0.2)
self.drop3 = nn.Dropout(p = 0.25)
self.drop4 = nn.Dropout(p = 0.25)
self.drop5 = nn.Dropout(p = 0.3)
self.drop6 = nn.Dropout(p = 0.4)
  • 将构建具有ReLU作为激活功能的前馈网络。
代码语言:javascript
复制
  def forward(self, x):
        ## TODO: Define the feedforward behavior of this model
        ## x is the input image and, as an example, here you may choose to include a pool/conv step:
        ## x = self.pool(F.relu(self.conv1(x)))
      
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.drop1(x)
        x = self.pool2(F.relu(self.conv2(x)))
        x = self.drop2(x)
        x = self.pool3(F.relu(self.conv3(x)))
        x = self.drop3(x)
        x = self.pool4(F.relu(self.conv4(x)))
        x = self.drop4(x)
        x = self.pool5(F.relu(self.conv5(x)))
        x = self.drop5(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.drop6(x)
        x = self.fc2(x)
        # a modified x, having gone through all the layers of your model, should be returned
        return x
代码语言:javascript
复制
def forward(self, x):        
## TODO: Define the feedforward behavior of this model        
## x is the input image and, as an example, here you may choose to #include a pool/conv step:        
## x = self.pool(F.relu(self.conv1(x)))              
x = self.pool1(F.relu(self.conv1(x)))        
x = self.drop1(x)        
x = self.pool2(F.relu(self.conv2(x)))        
x = self.drop2(x)        
x = self.pool3(F.relu(self.conv3(x)))        
x = self.drop3(x)        
x = self.pool4(F.relu(self.conv4(x)))        
x = self.drop4(x)        
x = self.pool5(F.relu(self.conv5(x)))        
x = self.drop5(x)        
x = x.view(x.size(0), -1)        
x = F.relu(self.fc1(x))        
x = self.drop6(x)        
x = self.fc2(x)        
# a modified x, having gone through all the layers of your model, #should be returned        
return x
  • 像以前一样创建转换后的面部关键点数据集
代码语言:javascript
复制
# create the transformed dataset
transformed_dataset = FacialKeypointsDataset(csv_file='/data/training_frames_keypoints.csv',
                                             root_dir='/data/training/',
                                             transform=data_transform)
 
 
print('Number of images: ', len(transformed_dataset))
 
# iterate through the transformed dataset and print some stats about the first few samples
for i in range(4):
    sample = transformed_dataset[i]
print(i, sample['image'].size(), sample['keypoints'].size())
  • 批处理和加载数据

接下来,在定义了转换后的数据集之后,可以使用PyTorch的DataLoader类以任意大小批量加载训练数据,并重新整理数据以训练模型。

代码语言:javascript
复制
# load training data in batches
batch_size = 10
 
train_loader = DataLoader(transformed_dataset,
                          batch_size=batch_size,
                          shuffle=True,
                          num_workers=0)
  • 训练CNN模型并追踪损失
代码语言:javascript
复制
## TODO: Define the loss and optimization
import torch.optim as optim
 
criterion = nn.SmoothL1Loss()
 
optimizer = optim.Adam(net.parameters(), lr = 0.001)

注意:请尝试使用其他准则“ 损失”函数,并且将学习率的值设置为可能的最低数;在这种情况下(0.001)。

  • 训练和初步观察

为了快速观察模型的训练方式并决定是否应该修改其结构或超参数,建议首先从一个或两个时期开始。在训练时,请注意模型的损失随时间变化的表现:起初是否会迅速减少,然后减慢?首先要花一点时间吗?如果更改训练数据的批量大小或修改损失函数会怎样?等等

使用这些初步观察结果来更改模型,并确定最佳体系结构,然后再训练许多时期并创建最终模型。

找到好模型后,请保存它。这样就可以稍后加载和使用它。

在训练了神经网络以检测面部关键点之后,可以将该网络应用于包含面部的任何图像。

  • 使用项目中的Haar级联检测器检测任何图像中的人脸。
代码语言:javascript
复制
# load in a haar cascade classifier for detecting frontal faces
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
 
# run the detector
# the output here is an array of detections; the corners of each detection box
# if necessary, modify these parameters until you successfully identify every face in a given image
faces = face_cascade.detectMultiScale(image, 1.2, 2)
 
# make a copy of the original image to plot detections on
image_with_detections = image.copy()
 
# loop over the detected faces, mark the image where each face is found
for (x,y,w,h) in faces:
    # draw a rectangle around each detected face
    # you may also need to change the width of the rectangle drawn depending on image resolution
    cv2.rectangle(image_with_detections,(x,y),(x+w,y+h),(255,0,0),3)
 
fig = plt.figure(figsize=(9,9))
 
plt.imshow(image_with_detections)

Haar级联检测器

将每个检测到的人脸转换为输入张量

需要对检测到的每张脸执行以下步骤:

  1. 将人脸从RGB转换为灰度
  2. 标准化灰度图像,使其颜色范围落在[0,1]而不是[0,255]
  3. 将检测到的人脸重新缩放为CNN的预期正方形尺寸(224x224,建议)
  4. 将numpy图像重塑为torch图像。

检测并显示预测的关键点

在将每个面部适当地转换为输入张量供网络用作输入后,可以将网络应用于每个面部。输出应该是预测的面部关键点。

这些关键点将需要“未规范化”才能显示,并且可能会发现编写诸如的辅助函数会有所帮助show_keypoints。

代码语言:javascript
复制
def showpoints(image,keypoints):
    
    plt.figure()
    
    keypoints = keypoints.data.numpy()
    keypoints = keypoints * 60.0 + 68
    keypoints = np.reshape(keypoints, (68, -1))
    
    plt.imshow(image, cmap='gray')
    plt.scatter(keypoints[:, 0], keypoints[:, 1], s=50, marker='.', c='r')
from torch.autograd import Variable
image_copy = np.copy(image)
# loop over the detected faces from your haar cascade
for (x,y,w,h) in faces:
    
    # Select the region of interest that is the face in the image
    roi = image_copy[y:y+h,x:x+w]
## TODO: Convert the face region from RGB to grayscale
    roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
    image = roi
## TODO: Normalize the grayscale image so that its color range falls ##in [0,1] instead of [0,255]
    roi = roi/255.0
    
    ## TODO: Rescale the detected face to be the expected square ##size for your CNN (224x224, suggested)
    roi = cv2.resize(roi, (224,224))
    
    ## TODO: Reshape the numpy image shape (H x W x C) into a torch ##image shape (C x H x W)
    roi = np.expand_dims(roi, 0)
    roi = np.expand_dims(roi, 0)
    
    ## TODO: Make facial keypoint predictions using your loaded, ##trained network
    roi_torch = Variable(torch.from_numpy(roi))
    
    roi_torch = roi_torch.type(torch.FloatTensor)
    keypoints = net(roi_torch)
## TODO: Display each detected face and the corresponding keypoints        
showpoints(image,keypoints)
代码语言:javascript
复制
def showpoints(image,keypoints):
    
    plt.figure()
    
    keypoints = keypoints.data.numpy()
    keypoints = keypoints * 60.0 + 68
    keypoints = np.reshape(keypoints, (68, -1))
    
    plt.imshow(image, cmap='gray')
    plt.scatter(keypoints[:, 0], keypoints[:, 1], s=50, marker='.', c='r')
    
 
 
from torch.autograd import Variable
image_copy = np.copy(image)
 
 
# loop over the detected faces from your haar cascade
for (x,y,w,h) in faces:
    
    # Select the region of interest that is the face in the image
    roi = image_copy[y:y+h,x:x+w]
 
    ## TODO: Convert the face region from RGB to grayscale
    roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
    image = roi
 
    ## TODO: Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
    roi = roi/255.0
    
    ## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
    roi = cv2.resize(roi, (224,224))
    
    ## TODO: Reshape the numpy image shape (H x W x C) into a torch image shape (C x H x W)
    roi = np.expand_dims(roi, 0)
    roi = np.expand_dims(roi, 0)
    
    ## TODO: Make facial keypoint predictions using your loaded, trained network
    roi_torch = Variable(torch.from_numpy(roi))
    
    roi_torch = roi_torch.type(torch.FloatTensor)
    keypoints = net(roi_torch)
 
    ## TODO: Display each detected face and the corresponding keypoints        
showpoints(image,keypoints)

输出:

检测到面部关键点

哦! 就Voldemort所担心的CNN无法检测到的鼻子而言,Pinnochio的一条建议可能会有所帮助。

随时在Github上查看项目。

https://github.com/Noob-can-Compile/Facial_Keypoint_Detection

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-12-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 相约机器人 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
人脸识别
腾讯云神图·人脸识别(Face Recognition)基于腾讯优图强大的面部分析技术,提供包括人脸检测与分析、比对、搜索、验证、五官定位、活体检测等多种功能,为开发者和企业提供高性能高可用的人脸识别服务。 可应用于在线娱乐、在线身份认证等多种应用场景,充分满足各行业客户的人脸属性识别及用户身份确认等需求。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档