首页> 外文期刊>Computational intelligence and neuroscience >A Community Detection Approach to Cleaning Extremely Large Face Database
【24h】

A Community Detection Approach to Cleaning Extremely Large Face Database

机译:清理超大型人脸数据库的社区检测方法

获取原文
           

摘要

Though it has been easier to build large face datasets by collecting images from the Internet in this Big Data era, the time-consuming manual annotation process prevents researchers from constructing larger ones, which makes the automatic cleaning of noisy labels highly desirable. However, identifying mislabeled faces by machine is quite challenging because the diversity of a person’s face images that are captured wildly at all ages is extraordinarily rich. In view of this, we propose a graph-based cleaning method that mainly employs the community detection algorithm and deep CNN models to delete mislabeled images. As the diversity of faces is preserved in multiple large communities, our cleaning results have both high cleanness and rich data diversity. With our method, we clean the extremely large MS-Celeb-1M face dataset (approximately 10 million images with noisy labels) and obtain a clean version of it called C-MS-Celeb (6,464,018 images of 94,682 celebrities). By training a single-net model using our C-MS-Celeb dataset, without fine-tuning, we achieve 99.67% at Equal Error Rate on the LFW face recognition benchmark, which is comparable to other state-of-the-art results. This demonstrates the data cleaning positive effects on the model training. To the best of our knowledge, our C-MS-Celeb is the largest clean face dataset that is publicly available so far, which will benefit face recognition researchers.
机译:尽管在此大数据时代通过从Internet收集图像来构建大脸部数据集比较容易,但是耗时的手动注释过程阻止研究人员构建更大的脸部数据集,这使自动清洁嘈杂的标签变得非常可取。但是,通过机器识别贴错标签的面孔非常具有挑战性,因为在各个年龄段都疯狂捕获的人脸图像的多样性非常丰富。有鉴于此,我们提出了一种基于图的清洗方法,该方法主要采用社区检测算法和深度CNN模型来删除标签错误的图像。由于在多个大型社区中都保留了面孔的多样性,因此我们的清洁结果既具有高度清洁性,又具有丰富的数据多样性。使用我们的方法,我们清理了非常大的MS-Celeb-1M人脸数据集(带有噪点标签的大约1000万张图像),并获得了称为C-MS-Celeb的干净版本(94,682位名人的6,464,018张图像)。通过使用我们的C-MS-Celeb数据集训练单网模型,而无需进行微调,我们在LFW人脸识别基准上的平均错误率达到了99.67%,这可以与其他最新结果相媲美。这证明了数据清理对模型训练的积极作用。据我们所知,我们的C-MS-Celeb是迄今为止公开提供的最大的面部清洁数据集,这将使面部识别研究人员受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号