On-the-fly learning for visual search of large-scale image and video datasets

Ken Chatfield; Relja Arandjelović; Omkar Parkhi; Andrew Zisserman

首页> 外文期刊>International Journal of Multimedia Information Retrieval >On-the-fly learning for visual search of large-scale image and video datasets

【24h】

On-the-fly learning for visual search of large-scale image and video datasets

机译：动态学习，用于可视化大规模图像和视频数据集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The objective of this work is to visually search large-scale video datasets for semantic entities specified by a text query. The paradigm we explore is constructing visual models for such semantic entities on-the-fly, i.e. at run time, by using an image search engine to source visual training data for the text query. The approach combines fast and accurate learning and retrieval, and enables videos to be returned within seconds of specifying a query. We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks). We discuss the features suitable for each class of query, for example Fisher vectors or features derived from convolutional neural networks (CNNs), and how these choices impact on the trade-off between three important performance measures for a real-time system of this kind, namely: (1) accuracy, (2) memory footprint, and (3) speed. We also discuss and compare a number of important implementation issues, such as how to remove ‘outliers’ in the downloaded images efficiently, and how to best obtain a single descriptor for a face track. We also sketch the architecture of the real-time on-the-fly system. Quantitative results are given on a number of large-scale image and video benchmarks (e.g. TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000h of unedited footage from BBC News, comprising 5M+ key frames.

机译：这项工作的目的是在视觉上搜索大型视频数据集，以查找由文本查询指定的语义实体。我们探索的范例是即时（即在运行时）通过使用图像搜索引擎为文本查询提供视觉训练数据来为此类语义实体构建视觉模型。该方法结合了快速，准确的学习和检索功能，并允许在指定查询后的几秒钟内返回视频。我们描述了三类查询，每种查询都有其关联的视觉搜索方法：对象实例（使用视觉单词袋进行匹配）；对象类别（使用判别式分类器对关键帧进行排名）；和面孔（使用判别式分类器对面孔轨迹进行排名）。我们将讨论适用于每类查询的功能，例如Fisher向量或从卷积神经网络（CNN）派生的功能，以及这些选择如何影响这种实时系统的三个重要性能指标之间的权衡，即：（1）准确性，（2）内存占用量和（3）速度。我们还将讨论和比较许多重要的实现问题，例如如何有效地删除下载的图像中的“异常值”，以及如何最好地获得面部轨迹的单个描述符。我们还绘制了实时实时系统的架构。在许多大型图像和视频基准（例如TRECVID INS，MIRFLICKR-1M）上给出了定量结果，并且我们进一步证明了我们的方法在来自10,000h未编辑素材的数据集上的性能和现实适用性。英国广播公司新闻，包含500万多个关键帧。

著录项

来源
《International Journal of Multimedia Information Retrieval》 |2015年第2期|75-93|共19页
作者
Ken Chatfield; Relja Arandjelović; Omkar Parkhi; Andrew Zisserman;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Object category retrieval and recognition; Object instance retrieval; Face retrieval; On-the-fly; Convolutional neural networks;

机译：对象类别的检索和识别;对象实例检索;人脸检索;即时卷积神经网络;

相似文献

外文文献
中文文献
专利

1. On-the-fly learning for visual search of large-scale image and video datasets [J] . Ken Chatfield, Relja Arandjelovi?, Omkar Parkhi, International Journal of Multimedia Information Retrieval . 2015,第2期

机译：动态学习，用于可视化大规模图像和视频数据集
2. Integrating memetic search into the BioHEL evolutionary learning system for large-scale datasets [J] . Dan Andrei Calian, Jaume Bacardit Memetic Computing . 2013,第2期

机译：将模因搜索集成到适用于大规模数据集的BioHEL进化学习系统中
3. Integrating memetic search into the BioHEL evolutionary learning system for large-scale datasets [J] . Dan Andrei Calian, Jaume Bacardit Memetic computing . 2013,第2期

机译：将模因搜索集成到适用于大规模数据集的BioHEL进化学习系统中
4. Learning Visual Balance from Large-scale Datasets of Aesthetically Highly Rated Images [C] . Ali Jahanian, S. V. N. Vishwanathan, Jan P. Allebach Human vision and electronic imaging XX . 2015

机译：从审美高评分图像的大规模数据集中学习视觉平衡
5. Similarity search for large-scale image datasets. [D] . Lv, Qin. 2006

机译：相似搜索大型图像数据集。
6. On-the-fly learning for visual search of large-scale image and video datasets [O] . Ken Chatfield, Relja Arandjelović, Omkar Parkhi, -1

机译：动态学习用于可视化大规模图像和视频数据集
7. On-the-fly learning for visual search of large-scale image and video datasets [O] . Ken Chatfield, Relja Arandjelović, Omkar Parkhi, 2015

机译：动态学习，用于可视化大规模图像和视频数据集

On-the-fly learning for visual search of large-scale image and video datasets

摘要

著录项

相似文献

相关主题

期刊订阅