Towards Automatic Construction of Diverse, High-Quality Image Datasets

Yao Yazhou; Zhang Jian; Shen Fumin; Liu Li; Zhu Fan; Zhang Dongxiang; Shen Heng Tao

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Towards Automatic Construction of Diverse, High-Quality Image Datasets

【24h】

Towards Automatic Construction of Diverse, High-Quality Image Datasets

机译：朝向自动构建多样化，高质量的图像数据集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The availability of labeled image datasets has been shown critical for high-level image understanding, which continuously drives the progress of feature designing and models developing. However, constructing labeled image datasets is laborious and monotonous. To eliminate manual annotation, in this work, we propose a novel image dataset construction framework by employing multiple textual queries. We aim at collecting diverse and accurate images for given queries from the Web. Specifically, we formulate noisy textual queries removing and noisy images filtering as a multi-view and multi-instance learning problem separately. Our proposed approach not only improves the accuracy but also enhances the diversity of the selected images. To verify the effectiveness of our proposed approach, we construct an image dataset with 100 categories. The experiments show significant performance gains by using the generated data of our approach on several tasks, such as image classification, cross-dataset generalization, and object detection. The proposed method also consistently outperforms existing weakly supervised and web-supervised approaches.

机译：已标记图像数据集的可用性对于高级图像理解至关重要，这不断推动特征设计和模型开发的进度。然而，构建标记的图像数据集是费力和单调的。为了消除手动注释，在这项工作中，我们通过采用多个文本查询提出了一种新颖的图像数据集施工框架。我们的目标是收集来自网络的给定查询的多样化和准确的图像。具体来说，我们制定嘈杂的文本查询，删除和嘈杂的图像筛选作为多视图和多实例学习问题。我们所提出的方法不仅提高了准确性，而且提高了所选图像的多样性。为了验证我们提出的方法的有效性，我们构建一个具有100个类别的图像数据集。通过在几个任务中使用我们的方法的生成数据，例如图像分类，交叉数据集概括和对象检测，实验表现出显着的性能增益。该方法还始终如一地优于现有的弱监督和网络监督方法。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2020年第6期|1199-1211|共13页
作者
Yao Yazhou; Zhang Jian; Shen Fumin; Liu Li; Zhu Fan; Zhang Dongxiang; Shen Heng Tao;
展开▼
作者单位

Incept Inst Artificial Intelligence Abu Dhabi 111999 U Arab Emirates;

Univ Technol Sydney Global Big Data Technol Ctr Ultimo NSW 2007 Australia;

Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 610054 Sichuan Peoples R China;

Incept Inst Artificial Intelligence Abu Dhabi 111999 U Arab Emirates;

Incept Inst Artificial Intelligence Abu Dhabi 111999 U Arab Emirates;

Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 610054 Sichuan Peoples R China;

Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 610054 Sichuan Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Noise measurement; Search engines; Manuals; Visualization; Scalability; Data models; Task analysis; Image dataset construction; multiple textual queries; dataset diversity;

机译：噪声测量;搜索引擎;手册;可视化;可伸缩性;数据模型;任务分析;图像数据集建设;多个文本查询;数据集多样性;

相似文献

外文文献
中文文献
专利

1. Construction of Diverse Image Datasets From Web Collections With Limited Labeling [J] . Niluthpol Chowdhury Mithun, Rameswar Panda, Amit K. Roy-Chowdhury Circuits and Systems for Video Technology, IEEE Transactions on . 2020,第4期

机译：使用有限标签的网络收集构建不同的图像数据集
2. An ex vivo imaging pipeline for producing high-quality and high-resolution diffusion-weighted imaging datasets. [J] . Dyrby TB, Baare WF, Alexander DC, Human brain mapping . 2011,第4期

机译：用于生成高质量和高分辨率扩散加权成像数据集的体外成像管道。
3. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures [J] . Bernardi Raffaella, Cakici Ruket, Elliott Desmond, The Journal of Artificial Intelligence Research . 2016,第10期

机译：从图像自动生成描述：模型，数据集和评估措施的调查
4. Well Begun Is Half Done: Generating High-Quality Seeds for Automatic Image Dataset Construction from Web [C] . Yan Xia, Xudong Cao, Fang Wen, European conference on computer vision . 2014

机译：良好的开始已经完成了一半：从Web生成用于自动图像数据集构建的高质量种子
5. Automatic construction of arterial and venous vascular trees in fundus images. [D] . Hu, Qiao. 2016

机译：在眼底图像中自动构建动脉和静脉血管树。
6. Kinome-wide Activity Modeling from Diverse Public High-Quality Datasets [O] . Stephan C. Schürer, Steven M. Muskal -1

机译：激酶组范围从广大公众高品质的数据集活动建模
7. An ex vivo imaging pipeline for producing high-quality and high-resolution diffusion-weighted imaging datasets [O] . Tim B. Dyrby, William F.C. Baaré, Daniel C. Alexander, 2010

机译：用于生产高质量和高分辨率扩散加权成像数据集的前体内成像管道
8. Automatic, Quantitative Image Analysis System for Construction Materials. Executive Summary [R] . Shi, D. 1988

机译：建筑材料自动定量图像分析系统。执行摘要

Towards Automatic Construction of Diverse, High-Quality Image Datasets

摘要

著录项

相似文献

相关主题

期刊订阅