首页> 外文期刊>Computational intelligence and neuroscience >A Community Detection Approach to Cleaning Extremely Large Face Database
【24h】

A Community Detection Approach to Cleaning Extremely Large Face Database

机译:一种清洁极大的面部数据库的社区检测方法

获取原文
获取原文并翻译 | 示例
       

摘要

Though it has been easier to build large face datasets by collecting images from the Internet in this Big Data era, the time-consuming manual annotation process prevents researchers from constructing larger ones, which makes the automatic cleaning of noisy labels highly desirable. However, identifying mislabeled faces by machine is quite challenging because the diversity of a person’s face images that are captured wildly at all ages is extraordinarily rich. In view of this, we propose a graph-based cleaning method that mainly employs the community detection algorithm and deep CNN models to delete mislabeled images. As the diversity of faces is preserved in multiple large communities, our cleaning results have both high cleanness and rich data diversity. With our method, we clean the extremely large MS-Celeb-1M face dataset (approximately 10 million images with noisy labels) and obtain a clean version of it called C-MS-Celeb (6,464,018 images of 94,682 celebrities). By training a single-net model using our C-MS-Celeb dataset, without fine-tuning, we achieve 99.67% at Equal Error Rate on the LFW face recognition benchmark, which is comparable to other state-of-the-art results. This demonstrates the data cleaning positive effects on the model training. To the best of our knowledge, our C-MS-Celeb is the largest clean face dataset that is publicly available so far, which will benefit face recognition researchers.
机译:虽然通过从互联网中收集来自互联网的图像更容易构建大面对数据集,但耗时的手动注释过程可防止研究人员构建更大的,这使得高度清洁噪声标签的自动清洁。然而,通过机器识别错误标记的面是非常具有挑战性的,因为一个人在所有年龄段疯狂捕获的人的脸部图像的多样性非常丰富。鉴于此,我们提出了一种基于图形的清洁方法,主要采用社区检测算法和深度CNN模型来删除错误标记的图像。随着面孔的多样性被保存在多个大型社区中,我们的清洁结果具有高清洁性和丰富的数据分集。通过我们的方法,我们清洁极大的MS-CeleB-1M面部数据集(大约1000万个具有嘈杂标签的图像),并获得一个名为C-MS-Celeb的清洁版本(6,464,018个名人的6,464,018张图片)。通过使用我们的C-MS-Celeb数据集培训单网模型,无需微调,我们在LFW面部识别基准测试中以相同的错误率达到99.67%,这与其他最先进的结果相当。这证明了数据清理模型培训的积极影响。据我们所知,我们的C-MS-CELEB是迄今为止公开可用的最大清洁面部数据集,这将使人脸识别研究人员受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号