...
首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition
【24h】

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

机译:眼中的动作:用于视觉识别的动态凝视数据集和学习的显着性模型

获取原文
获取原文并翻译 | 示例
           

摘要

Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in ‘saccade and fixate’ regimes, the methodology and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2  and UCF Sports  with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks. To our knowledge these are the first large human eye tracking datasets to be collected and made publicly available for video, (497,107 frames, each viewed by 19 subjects), unique in terms of their . Second, we introduce novel , which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the significant amount of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and the human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end
机译:基于稀疏兴趣点操作员最大值收集的图像特征的词袋模型系统已成功用于计算机视觉对象和动作识别任务。尽管稀疏的,基于兴趣点的识别方法并不与在“扫视和固定”状态下运行的生物系统中的视觉处理相矛盾,但人类和计算机视觉界的方法和重点仍然截然不同。在此,我们做出了三点努力,以弥合这一差距。首先,我们在视觉动作和场景上下文识别任务的生态约束下,通过收集人眼运动来补充现有的最新大型动态计算机视觉注释数据集(例如Hollywood-2和UCF Sports)。据我们所知,这是第一个收集并公开用于视频的大型人眼跟踪数据集(497,107帧,每个由19个对象观看),就其而言是独一无二的。其次,我们介绍了小说,这突显了主题之间视觉搜索模式的出色稳定性。第三,我们利用大量收集的数据进行研究,并基于人眼的运动构建自动的,端对端的可训练计算机视觉系统。我们的研究不仅揭示了计算机视觉时空兴趣点图像采样策略与人工注视之间的差异,以及它们对视觉识别性能的影响,而且还证明了可以准确地预测人类注视,以及何时将其用于环境中。端到端

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号