Wang Fei /Chen Liren /Li Cheng /Huang Shiyao /Chen Yanjie /Qian Chen /Loy Chen Change
推荐原因
本文对于人脸识别领域作出以下贡献:(1)清理出了现有大规模人脸数据集(包括MegaFace和MS-Celeb-1M)的干净子集,并提出了一个新的无噪声人脸数据集IMDb_Face;(2)利用原始数据集以及清理后的干净子集,对MegaFace和MS-Celeb-1M数据集中的噪声特性和来源做了全面的分析,发现干净子集对于提高人脸识别精度效果显著;(3)本文提出了一种用于数据清理的标注流程,大量的用户调研显示该流程是高效且可控的。IMDb-Face数据集已开源在:https://github.com/fwang91/IMDb-Face。
来自AI研习社用户@约翰尼•德普的推荐
摘要
The growing scale of face recognition datasets empowers us to train strong convolutional networks for face recognition. While a variety of architectures and loss functions have been devised, we still have a limited understanding of the source and consequence of label noise inherent in existing datasets. We make the following contributions: 1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets and cleaned subsets, we profile and analyze label noise properties of MegaFace and MS-Celeb-1M. We show that a few orders more samples are needed to achieve the same accuracy yielded by a clean subset. 3) We study the association between different types of noise, i.e., label flips and outliers, with the accuracy of face recognition models. 4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy.
论文查阅地址:
http://www.gair.link/page/paperDetail/18