首页> 外文会议>European conference on computer vision >Well Begun Is Half Done: Generating High-Quality Seeds for Automatic Image Dataset Construction from Web
【24h】

Well Begun Is Half Done: Generating High-Quality Seeds for Automatic Image Dataset Construction from Web

机译:良好的开始已经完成了一半:从Web生成用于自动图像数据集构建的高质量种子

获取原文

摘要

We present a fully automatic approach to construct a large-scale, high-precision dataset from noisy web images. Within the entire pipeline, we focus on generating high quality seed images for subsequent dataset growing. High quality seeds are essential as we revealed, but they have received relatively less attention in previous works with respect to how to automatically generate them. In this work, we propose a density score based on rank-order distance to identify positive seed images. The basic idea is images relevant to a concept typically are tightly clustered, while the outliers are widely scattered. Through adaptive thresholding, we guarantee the selected seeds as numerous and accurate as possible. Starting with the high quality seeds, we grow a high quality dataset by dividing seeds and conducting iterative negative and positive mining. Our system can automatically collect thousands of images for one concept/class, with a precision rate of 95% or more. Comparisons with recent state-of-the-arts also demonstrate our method's superior performance.
机译:我们提出了一种从嘈杂的网络图像构建大规模,高精度数据集的全自动方法。在整个流程中,我们专注于为后续数据集增长生成高质量的种子图像。正如我们所揭示的那样,高质量种子是必不可少的,但是在以前的作品中,关于如何自动生成它们的关注相对较少。在这项工作中,我们提出了一种基于等级距离的密度分数来识别阳性种子图像。基本思想是与概念相关的图像通常紧密地聚在一起,而离群值则分散很大。通过自适应阈值处理,我们保证所选种子尽可能多且准确。从高质量种子开始,我们通过划分种子并进行迭代的负向和正向挖掘来生长高质量的数据集。我们的系统可以针对一个概念/类别自动收集数千张图像,准确率达到95%或更高。与最新技术的比较也证明了我们方法的优越性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号