Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition

机译：动态眼动数据集和视觉动作识别的学习显着性模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Systems based on bag-of-words models operating on image features collected at maxima of sparse interest point operators have been extremely successful for both computer-based visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in "saccade and fixate" regimes, the knowledge, methodology, and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large-scale dynamic computer vision datasets like Hollywood-2[1] and UCF Sports[2] with human eye movements collected under the ecological constraints of the visual action recognition task. To our knowledge these are the first massive human eye tracking datasets of significant size to be collected for video (497,107 frames, each viewed by 16 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as opposed to free-viewing. Second, we introduce novel dynamic consistency and alignment models, which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the massive amounts of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the most advanced computer vision practice, can lead to state of the art results.

机译：基于词袋模型的系统基于稀疏兴趣点运算符的最大值所收集的图像特征进行操作，在基于计算机的视觉对象和动作识别任务方面都取得了巨大的成功。尽管基于兴趣点的稀疏识别方法与以“扫视和固定”状态运行的生物系统中的视觉处理并不矛盾，但人类和计算机视觉社区中的知识，方法和重点仍然截然不同。在此，我们做出了三项旨在弥合这一差距的贡献。首先，我们将在视觉动作识别任务的生态约束下收集的人眼运动补充现有的最先进的大型动态计算机视觉数据集，例如Hollywood-2 [1]和UCF Sports [2]。据我们所知，这是第一个大规模的大规模人眼跟踪数据集，可用于视频采集（497,107帧，每个由16个对象观看），在以下方面是独一无二的：（a）大规模和计算机视觉相关性，（b）动态，视频刺激，（c）任务控制，而不是自由观看。其次，我们介绍了新颖的动态一致性和对齐模型，突显了主题之间视觉搜索模式的出色稳定性。第三，我们利用大量收集的数据来进行研究，并基于人眼的运动来构建自动的，端对端可训练的计算机视觉系统。我们的研究不仅揭示了计算机视觉时空兴趣点图像采样策略与人工注视之间的差异，以及它们对视觉识别性能的影响，而且还证明了可以准确地预测人类注视以及将其用于计算机视觉中。端到端自动系统，利用一些最先进的计算机视觉实践，可以带来最先进的结果。

著录项

来源
《European conference on computer vision》|2012年|842-856|共15页
会议地点
作者
Stefan Mathe; Cristian Sminchisescu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition [J] . Mathe Stefan, Sminchisescu Cristian Pattern Analysis and Machine Intelligence, IEEE Transactions on . 2015,第7期

机译：眼中的动作：用于视觉识别的动态凝视数据集和学习的显着性模型
2. Toward Statistical Modeling of Saccadic Eye-Movement and Visual Saliency [J] . Sun X., Yao H., Ji R., Image Processing, IEEE Transactions on . 2014,第11期

机译：眼动和视觉显着性的统计建模
3. Modeling eye movements in visual agnosia with a saliency map approach: bottom-up guidance or top-down strategy? [J] . Foulsham T, Barton JJ, Kingstone A, Neural Networks: The Official Journal of the International Neural Network Society . 2011,第6期

机译：使用显着图方法在视觉失明中模拟眼动：自下而上的指导还是自上而下的策略？
4. Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition [C] . Stefan Mathe, Cristian Sminchisescu European Conference on Computer Vision . 2012

机译：动态眼睛移动数据集和视觉动作识别的冒险模型
5. Tracking eye movements to uncover the nature of visual -linguistic interaction in static and dynamic scenes [D] . van de Velde, Caroline 2008

机译：跟踪眼睛的运动以发现静态和动态场景中视觉语言交互的本质
6. Impulse processing: A dynamical systems model of incremental eye movements in the visual world paradigm [O] . Anuenue Kukona, Whitney Tabor -1

机译：脉冲处理：视觉世界范式中的增量眼动态动态系统模型
7. Dynamic eye movement datasets and learnt saliency models for visual action recognition [O] . Stefan Mathe, Cristian Sminchisescu 2012

机译：动态眼动数据集和学习的显着性模型用于视觉动作识别

Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition

摘要

著录项

相似文献

相关主题

期刊订阅