A data-driven approach to cleaning large face datasets

机译：一种数据驱动的方法来清洗大型面部数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large face datasets are important for advancing face recognition research, but they are tedious to build, because a lot of work has to go into cleaning the huge amount of raw data. To facilitate this task, we describe an approach to building face datasets that starts with detecting faces in images returned from searches for public figures on the Internet, followed by discarding those not belonging to each queried person. We formulate the problem of identifying the faces to be removed as a quadratic programming problem, which exploits the observations that faces of the same person should look similar, have the same gender, and normally appear at most once per image. Our results show that this method can reliably clean a large dataset, leading to a considerable reduction in the work needed to build it. Finally, we are releasing the FaceScrub dataset that was created using this approach. It consists of 141,130 faces of 695 public figures and can be obtained from http://vintage.winklerbros.net/facescrub.html.

机译：大型面部数据集对于推进面部识别研究很重要，但构建起来却很繁琐，因为清理大量的原始数据需要进行大量工作。为了简化此任务，我们描述了一种构建面部数据集的方法，该方法首先检测Internet搜索公共人物所返回的图像中的面部，然后丢弃不属于每个被查询人的面部。我们将识别要去除的脸部的问题公式化为二次编程问题，该问题利用了以下观察结果：同一个人的脸部看起来应该相似，具有相同的性别，并且每个图像通常最多出现一次。我们的结果表明，该方法可以可靠地清理大型数据集，从而大大减少了构建数据集所需的工作。最后，我们将发布使用此方法创建的FaceScrub数据集。它由695位公众人物的141,130张面孔组成，可以从http://vintage.winklerbros.net/facescrub.html获得。

著录项

来源
《IEEE International Conference on Image Processing》|2014年|343-347|共5页
会议地点
作者
Hong-Wei Ng; Winkler Stefan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
face recognition; visual databases; FaceScrub dataset; Internet; data-driven approach; face detection; face recognition research; large face datasets cleaning; public figures; quadratic programming problem; Computer vision; Detectors; Face; Face recognition; Support vector machines; Vectors; Face Recognition; Outlier Detection;

机译：人脸识别;视觉数据库; FaceScrub数据集;互联网;数据驱动方法;人脸检测;人脸识别研究;大型人脸数据集清洗;公众人物;二次编程问题;计算机视觉;检测器;人脸;人脸识别;支持向量机;载体;人脸识别;离群值检测;

相似文献

外文文献
中文文献
专利

1. To clean or not to clean phenotypic datasets for outlier plants in genetic analyses? [J] . Prado Santiago Alvarez, Sanchez Isabelle, Cabrera-Bosquet Llorenc, Journal of Experimental Botany . 2019,第15期

机译：清洁或不清洁遗传分析中的异常植物表型数据集？
2. Efficient creation of datasets for data-driven power system applications [J] . Venzke Andreas, Molzahn Daniel K., Chatzivasileiadis Spyros Electric power systems research . 2021,第Jana期

机译：有效地创建数据带电系统应用的数据集
3. Towards data-driven energy communities: A review of open-source datasets, models and tools [J] . Kazmi Hussain, Munne-Collado Ingrid, Mehmood Fahad, Renewable & Sustainable Energy Reviews . 2021,第Sepa期

机译：迈向数据驱动的能源社区：对开源数据集，模型和工具的审查
4. A data-driven approach to cleaning large face datasets [C] . Hong-Wei Ng, Winkler Stefan IEEE International Conference on Image Processing . 2014

机译：一种清洁大面对数据集的数据驱动方法
5. Scaling the Technology Opportunity Analysis text data mining methodology: Data extraction, cleaning, online analytical processing analysis, and reporting of large multi-source datasets. [D] . George, Richard Peyton. 2006

机译：扩展技术机会分析文本数据挖掘方法：数据提取，清理，在线分析处理分析以及大型多源数据集的报告。
6. Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy [O] . Peter Donald Hurley, Seb Oliver, Anil Mehta -1

机译：使用来自天文学的新颖贝叶斯概率方法在囊性纤维化注册表中创建纵向数据集并清除现有数据标识符
7. A DATA-DRIVEN APPROACH TO CLEANING LARGE FACE DATASETS [O] . Hong-wei Ng, Stefan Winkler 2015

机译：一种清理大面积数据的数据驱动方法

A data-driven approach to cleaning large face datasets

摘要

著录项

相似文献

相关主题

期刊订阅