Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks

Lai Qiuxia; Wang Wenguan; Sun Hanqiu; Shen Jianbing

首页> 外文期刊>IEEE Transactions on Image Processing >Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks

【24h】

Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks

机译：使用时空残余关节网络的视频显着性预测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a novel residual attentive learning network architecture for predicting dynamic eye-fixation maps. The proposed model emphasizes two essential issues, i.e., effective spatiotemporal feature integration and multi-scale saliency learning. For the first problem, appearance and motion streams are tightly coupled via dense residual cross connections, which integrate appearance information with multi-layer, comprehensive motion features in a residual and dense way. Beyond traditional two-stream models learning appearance and motion features separately, such design allows early, multi-path information exchange between different domains, leading to a unified and powerful spatiotemporal learning architecture. For the second one, we propose a composite attention mechanism that learns multi-scale local attentions and global attention priors end-to-end. It is used for enhancing the fused spatiotemporal features via emphasizing important features in multi-scales. A lightweight convolutional Gated Recurrent Unit (convGRU), which is flexible for small training data situation, is used for long-term temporal characteristics modeling. Extensive experiments over four benchmark datasets clearly demonstrate the advantage of the proposed video saliency model over other competitors and the effectiveness of each component of our network. Our code and all the results will be available at https://github.com/ashleylqx/STRA-Net.

机译：本文提出了一种用于预测动态眼固定图的新型残差学习网络架构。所提出的模型强调了两个基本问题，<斜体> i.e 。，有效的时空特征集成和多尺度的粘性学习。对于第一个问题，外观和运动流通过密集的残余交叉连接紧密耦合，其以残余和致密的方式将外观信息与多层综合运动特征集成在一起。除了传统的两流模型之外，可以单独学习外观和运动功能，这种设计允许在不同域之间的早期，多路径信息交换，导致统一和强大的时空学习架构。对于第二个，我们提出了一种综合关注机制，这些机制学习多规模的本地关注和全球注意力前端的结束。它用于通过强调多尺度的重要特征来增强熔融的时空特征。用于小型训练数据情况灵活的轻量级卷积栅格复发单元（Concrecru）用于长期时间特性建模。超过四个基准数据集的广泛实验清楚地展示了所提出的视频显着模型对其他竞争对手的优势以及我们网络每个组件的有效性。我们的代码和所有结果将在 https://github.com/ashleylqx/stra-net 上获得。

著录项

来源
《IEEE Transactions on Image Processing》 |2020年第2020期|1113-1126|共14页
作者
Lai Qiuxia; Wang Wenguan; Sun Hanqiu; Shen Jianbing;
展开▼
作者单位

Chinese Univ Hong Kong Dept Comp Sci & Engn Hong Kong Peoples R China;

Beijing Inst Technol Sch Comp Sci Beijing Lab Intelligent Informat Technol Beijing 100081 Peoples R China;

Chinese Univ Hong Kong Dept Comp Sci & Engn Hong Kong Peoples R China;

Beijing Inst Technol Sch Comp Sci Beijing Lab Intelligent Informat Technol Beijing 100081 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Visualization; Computational modeling; Spatiotemporal phenomena; Dynamics; Task analysis; Data models; Predictive models; Dynamic eye-fixation prediction; residual attentive learning; attention mechanism; deep learning; video saliency;

机译：可视化;计算建模;时尚现象;动态;任务分析;数据模型;预测模型;动态眼睛固定预测;剩余注意学习;注意机制;深度学习;视频显着;

相似文献

外文文献
中文文献
专利

1. STA3D: Spatiotemporally attentive 3D network for video saliency prediction [J] . Zou Wenbin, Zhuo Shengkai, Tang Yi, Pattern recognition letters . 2021,第Jula期

机译：STA3D：用于视频显着性预测的时空细节3D网络
2. Video salient object detection via spatiotemporal attention neural networks [J] . Tang Yi, Zou Wenbin, Hua Yang, Neurocomputing . 2020,第Feba15期

机译：时空注意神经网络的视频显着目标检测
3. Discovering salient objects from videos using spatiotemporal salient region detection [J] . Kannan Rajkumar, Ghinea Gheorghita, Swaminathan Sridhar Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2015,第Null期

机译：使用时空显着区域检测从视频中发现显着对象
4. TSMSAN: A Three-Stream Multi-Scale Attentive Network for Video Saliency Detection [C] . Jingwen Yang, Guanwen Zhang, Jiaming Yan, International Conference on Pattern Recognition . 2021

机译：TSMSAN：用于视频显着性检测的三流多尺度细心网络
5. Incremental Prediction of Sentence-Final Verbs with Attentive Recurrent Neural Networks [D] . Li, Wenyan. 2018

机译：句子最终动词与周度复发性神经网络的增量预测
6. Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition [O] . Jianyu Chen, Jun Kong, Hui Sun, 2020

机译：具有Pseudo3D的时空交互残差网络用于视频动作识别
7. Video saliency prediction using enhanced spatiotemporal alignment network [O] . Jin Chen, Huihui Song, Kaihua Zhang, 2021

机译：使用增强的时空对齐网络的视频显着预测

Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks

摘要

著录项

相似文献

相关主题

期刊订阅