使用CNN和PyTorch进行面部关键点检测

代码医生工作室

发布于 2019-12-06 13:51:09

4.3K00

代码可运行

文章被收录于专栏：相约机器人相约机器人

运行总次数：0

代码可运行

作者 | Krunal Kshirsagar

来源 | Medium

编辑 | 代码医生团队

什么是面部关键点？

面部关键点也称为面部地标，通常指定面部的鼻子，眼睛，嘴巴等区域，该面部按68个关键点分类，并带有该坐标的坐标（x，y）。使用面部关键点，可以实现面部识别，情绪识别等。

点代表关键点

选择数据集：

由于Udacity已经提供了YouTube Faces数据集，因此将使用它。它是一个数据集，包含3,425个面部视频，旨在研究视频中无约束的面部识别问题。这些视频已通过处理步骤输入，并转换为包含一个脸部和关联关键点的图像帧集。

https://www.cs.tau.ac.il/~wolf/ytfaces/

训练和测试数据：

该面部关键点数据集由5770幅彩色图像组成。所有这些图像都分为训练数据集或测试数据集。

这些图像中的3462是训练图像，供在创建模型以预测关键点时使用。

2308是测试图像，将用于测试模型的准确性。

预处理数据：

为了将数据（图像）输入到神经网络，必须通过将numpy数组转换为Pytorch张量来将图像转换为固定的尺寸大小和标准的颜色范围（以便进行更快的计算）。

转换：

Normalize：将彩色图像转换为[0,1]范围的灰度值，并将关键点规范化为大约[-1，1]的范围
Rescale：将图像重新缩放至所需尺寸。
RandomCrop：随机裁剪图像。
ToTensor：将numpy图片转换为torch图像。

# test out some of these transforms
rescale = Rescale(100)
crop = RandomCrop(50)
composed = transforms.Compose([Rescale(250),
                               RandomCrop(224)])

# apply the transforms to a sample image
test_num = 500
sample = face_dataset[test_num]
fig = plt.figure()
for i, tx in enumerate([rescale, crop, composed]):
    transformed_sample = tx(sample)
    ax = plt.subplot(1, 3, i + 1)
    plt.tight_layout()
    ax.set_title(type(tx).__name__)
    show_keypoints(transformed_sample['image'], transformed_sample['keypoints'])
plt.show()

转换输出

创建转换后的数据集：

# define the data tranform
# order matters! i.e. rescaling should come before a smaller crop
data_transform = transforms.Compose([Rescale(250),
                                     RandomCrop(224),
                                     Normalize(),
                                     ToTensor()])

# create the transformed dataset
transformed_dataset = FacialKeypointsDataset(csv_file='/data/training_frames_keypoints.csv',
                                             root_dir='/data/training/',
                                             transform=data_transform)

这里224 * 224px是通过变换获得的标准化输入图像大小，输出类别得分应为136，即136/2 = 68

定义CNN架构：

在查看了要使用的数据并了解了图像的形状和关键点之后，就可以定义一个可以从该数据中学习的卷积神经网络了。

定义此CNN的所有层，唯一的要求是：

该网络接收一个正方形（宽度和高度相同）的灰度图像作为输入。
它以代表关键点的线性层结尾（最后一层输出136个值，对于68个关键点（x，y）对中的每个输出2个值）。

卷积层的形状：

K — out_channels：卷积层中的过滤器数

F — kernel_size

S-卷积的步幅

P-填充

W —上一层的宽度/高度（正方形）

The self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

output size = (W-F)/S +1 = (224–5)/1 +1 = 220, the output Tensor for one image will have the dimensions: (1, 220, 220)

1 input image channel (grayscale), 32 output channels/feature maps, 5x5 square convolution kernel

self.conv1 = nn.Conv2d(1, 32, 5)
# output size = (W-F)/S +1 = (224-5)/1 + 1 = 220
self.pool1 = nn.MaxPool2d(2, 2)
# 220/2 = 110  the output Tensor for one image, will have the #dimensions: (32, 110, 110)
self.conv2 = nn.Conv2d(32,64,3)
# output size = (W-F)/S +1 = (110-3)/1 + 1 = 108
self.pool2 = nn.MaxPool2d(2, 2)
#108/2=54   the output Tensor for one image, will have the #dimensions: (64, 54, 54)
self.conv3 = nn.Conv2d(64,128,3)
# output size = (W-F)/S +1 = (54-3)/1 + 1 = 52
self.pool3 = nn.MaxPool2d(2, 2)
#52/2=26    the output Tensor for one image, will have the #dimensions: (128, 26, 26)
self.conv4 = nn.Conv2d(128,256,3)
# output size = (W-F)/S +1 = (26-3)/1 + 1 = 24
self.pool4 = nn.MaxPool2d(2, 2)
#24/2=12   the output Tensor for one image, will have the #dimensions: (256, 12, 12)
self.conv5 = nn.Conv2d(256,512,1)
# output size = (W-F)/S +1 = (12-1)/1 + 1 = 12
self.pool5 = nn.MaxPool2d(2, 2)
#12/2=6    the output Tensor for one image, will have the #dimensions: (512, 6, 6)
#Linear Layer
self.fc1 = nn.Linear(512*6*6, 1024)
self.fc2 = nn.Linear(1024, 136)

self.conv1 = nn.Conv2d(1, 32, 5)
# output size = (W-F)/S +1 = (224-5)/1 + 1 = 220
self.pool1 = nn.MaxPool2d(2, 2)
# 220/2 = 110  the output Tensor for one image, will have the dimensions: (32, 110, 110)
 
self.conv2 = nn.Conv2d(32,64,3)
# output size = (W-F)/S +1 = (110-3)/1 + 1 = 108
self.pool2 = nn.MaxPool2d(2, 2)
#108/2=54   the output Tensor for one image, will have the dimensions: (64, 54, 54)
 
self.conv3 = nn.Conv2d(64,128,3)
# output size = (W-F)/S +1 = (54-3)/1 + 1 = 52
self.pool3 = nn.MaxPool2d(2, 2)
#52/2=26    the output Tensor for one image, will have the dimensions: (128, 26, 26)
 
self.conv4 = nn.Conv2d(128,256,3)
# output size = (W-F)/S +1 = (26-3)/1 + 1 = 24
self.pool4 = nn.MaxPool2d(2, 2)
#24/2=12   the output Tensor for one image, will have the dimensions: (256, 12, 12)
 
self.conv5 = nn.Conv2d(256,512,1)
# output size = (W-F)/S +1 = (12-1)/1 + 1 = 12
self.pool5 = nn.MaxPool2d(2, 2)
#12/2=6    the output Tensor for one image, will have the dimensions: (512, 6, 6)
 
#Linear Layer
self.fc1 = nn.Linear(512*6*6, 1024)
self.fc2 = nn.Linear(1024, 136)

可以添加Dropouts来规范化深度神经网络。获得更好结果的秘诀之一是将辍学的概率（p）保持在0.1到0.5的范围内。同样，最好有多个丢弃概率（p）不同的值。

self.drop1 = nn.Dropout(p = 0.1)
self.drop2 = nn.Dropout(p = 0.2)
self.drop3 = nn.Dropout(p = 0.25)
self.drop4 = nn.Dropout(p = 0.25)
self.drop5 = nn.Dropout(p = 0.3)
self.drop6 = nn.Dropout(p = 0.4)

将构建具有ReLU作为激活功能的前馈网络。

  def forward(self, x):
        ## TODO: Define the feedforward behavior of this model
        ## x is the input image and, as an example, here you may choose to include a pool/conv step:
        ## x = self.pool(F.relu(self.conv1(x)))
      
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.drop1(x)
        x = self.pool2(F.relu(self.conv2(x)))
        x = self.drop2(x)
        x = self.pool3(F.relu(self.conv3(x)))
        x = self.drop3(x)
        x = self.pool4(F.relu(self.conv4(x)))
        x = self.drop4(x)
        x = self.pool5(F.relu(self.conv5(x)))
        x = self.drop5(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.drop6(x)
        x = self.fc2(x)
        # a modified x, having gone through all the layers of your model, should be returned
        return x

def forward(self, x):        
## TODO: Define the feedforward behavior of this model        
## x is the input image and, as an example, here you may choose to #include a pool/conv step:        
## x = self.pool(F.relu(self.conv1(x)))              
x = self.pool1(F.relu(self.conv1(x)))        
x = self.drop1(x)        
x = self.pool2(F.relu(self.conv2(x)))        
x = self.drop2(x)        
x = self.pool3(F.relu(self.conv3(x)))        
x = self.drop3(x)        
x = self.pool4(F.relu(self.conv4(x)))        
x = self.drop4(x)        
x = self.pool5(F.relu(self.conv5(x)))        
x = self.drop5(x)        
x = x.view(x.size(0), -1)        
x = F.relu(self.fc1(x))        
x = self.drop6(x)        
x = self.fc2(x)        
# a modified x, having gone through all the layers of your model, #should be returned        
return x

像以前一样创建转换后的面部关键点数据集

# create the transformed dataset
transformed_dataset = FacialKeypointsDataset(csv_file='/data/training_frames_keypoints.csv',
                                             root_dir='/data/training/',
                                             transform=data_transform)
 
 
print('Number of images: ', len(transformed_dataset))
 
# iterate through the transformed dataset and print some stats about the first few samples
for i in range(4):
    sample = transformed_dataset[i]
print(i, sample['image'].size(), sample['keypoints'].size())

批处理和加载数据

接下来，在定义了转换后的数据集之后，可以使用PyTorch的DataLoader类以任意大小批量加载训练数据，并重新整理数据以训练模型。

# load training data in batches
batch_size = 10
 
train_loader = DataLoader(transformed_dataset,
                          batch_size=batch_size,
                          shuffle=True,
                          num_workers=0)

训练CNN模型并追踪损失

## TODO: Define the loss and optimization
import torch.optim as optim
 
criterion = nn.SmoothL1Loss()
 
optimizer = optim.Adam(net.parameters(), lr = 0.001)

注意：请尝试使用其他准则“ 损失”函数，并且将学习率的值设置为可能的最低数；在这种情况下（0.001）。

训练和初步观察

为了快速观察模型的训练方式并决定是否应该修改其结构或超参数，建议首先从一个或两个时期开始。在训练时，请注意模型的损失随时间变化的表现：起初是否会迅速减少，然后减慢？首先要花一点时间吗？如果更改训练数据的批量大小或修改损失函数会怎样？等等

使用这些初步观察结果来更改模型，并确定最佳体系结构，然后再训练许多时期并创建最终模型。

找到好模型后，请保存它。这样就可以稍后加载和使用它。

在训练了神经网络以检测面部关键点之后，可以将该网络应用于包含面部的任何图像。

使用项目中的Haar级联检测器检测任何图像中的人脸。

# load in a haar cascade classifier for detecting frontal faces
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
 
# run the detector
# the output here is an array of detections; the corners of each detection box
# if necessary, modify these parameters until you successfully identify every face in a given image
faces = face_cascade.detectMultiScale(image, 1.2, 2)
 
# make a copy of the original image to plot detections on
image_with_detections = image.copy()
 
# loop over the detected faces, mark the image where each face is found
for (x,y,w,h) in faces:
    # draw a rectangle around each detected face
    # you may also need to change the width of the rectangle drawn depending on image resolution
    cv2.rectangle(image_with_detections,(x,y),(x+w,y+h),(255,0,0),3)
 
fig = plt.figure(figsize=(9,9))
 
plt.imshow(image_with_detections)

Haar级联检测器

将每个检测到的人脸转换为输入张量

需要对检测到的每张脸执行以下步骤：

将人脸从RGB转换为灰度
标准化灰度图像，使其颜色范围落在[0,1]而不是[0,255]
将检测到的人脸重新缩放为CNN的预期正方形尺寸（224x224，建议）
将numpy图像重塑为torch图像。

检测并显示预测的关键点

在将每个面部适当地转换为输入张量供网络用作输入后，可以将网络应用于每个面部。输出应该是预测的面部关键点。

这些关键点将需要“未规范化”才能显示，并且可能会发现编写诸如的辅助函数会有所帮助show_keypoints。

def showpoints(image,keypoints):
    
    plt.figure()
    
    keypoints = keypoints.data.numpy()
    keypoints = keypoints * 60.0 + 68
    keypoints = np.reshape(keypoints, (68, -1))
    
    plt.imshow(image, cmap='gray')
    plt.scatter(keypoints[:, 0], keypoints[:, 1], s=50, marker='.', c='r')
from torch.autograd import Variable
image_copy = np.copy(image)
# loop over the detected faces from your haar cascade
for (x,y,w,h) in faces:
    
    # Select the region of interest that is the face in the image
    roi = image_copy[y:y+h,x:x+w]
## TODO: Convert the face region from RGB to grayscale
    roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
    image = roi
## TODO: Normalize the grayscale image so that its color range falls ##in [0,1] instead of [0,255]
    roi = roi/255.0
    
    ## TODO: Rescale the detected face to be the expected square ##size for your CNN (224x224, suggested)
    roi = cv2.resize(roi, (224,224))
    
    ## TODO: Reshape the numpy image shape (H x W x C) into a torch ##image shape (C x H x W)
    roi = np.expand_dims(roi, 0)
    roi = np.expand_dims(roi, 0)
    
    ## TODO: Make facial keypoint predictions using your loaded, ##trained network
    roi_torch = Variable(torch.from_numpy(roi))
    
    roi_torch = roi_torch.type(torch.FloatTensor)
    keypoints = net(roi_torch)
## TODO: Display each detected face and the corresponding keypoints        
showpoints(image,keypoints)

def showpoints(image,keypoints):
    
    plt.figure()
    
    keypoints = keypoints.data.numpy()
    keypoints = keypoints * 60.0 + 68
    keypoints = np.reshape(keypoints, (68, -1))
    
    plt.imshow(image, cmap='gray')
    plt.scatter(keypoints[:, 0], keypoints[:, 1], s=50, marker='.', c='r')
    
 
 
from torch.autograd import Variable
image_copy = np.copy(image)
 
 
# loop over the detected faces from your haar cascade
for (x,y,w,h) in faces:
    
    # Select the region of interest that is the face in the image
    roi = image_copy[y:y+h,x:x+w]
 
    ## TODO: Convert the face region from RGB to grayscale
    roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
    image = roi
 
    ## TODO: Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
    roi = roi/255.0
    
    ## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
    roi = cv2.resize(roi, (224,224))
    
    ## TODO: Reshape the numpy image shape (H x W x C) into a torch image shape (C x H x W)
    roi = np.expand_dims(roi, 0)
    roi = np.expand_dims(roi, 0)
    
    ## TODO: Make facial keypoint predictions using your loaded, trained network
    roi_torch = Variable(torch.from_numpy(roi))
    
    roi_torch = roi_torch.type(torch.FloatTensor)
    keypoints = net(roi_torch)
 
    ## TODO: Display each detected face and the corresponding keypoints        
showpoints(image,keypoints)

输出：