Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning

机译：基于多种视频字幕功能的Bi-LSTM网络中的时空注意

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatically generating rich natural language descriptions for open-domain videos is among the most challenging tasks of computer vision, natural language processing and machine learning. Based on the general approach of encoder-decoder frameworks, we propose a bidirectional long short-term memory network with spatial-temporal attention based on multiple features of objects, activities and scenes, which can learn valuable and complementary high-level visual representations, and dynamically focus on the most important context information of diverse frames within different subsets of videos. From the experimental results, our proposed methods achieve competitive or better than state-of-the-art performance on the MSVD video dataset.

机译：为开放域视频自动生成丰富的自然语言描述是计算机视觉，自然语言处理和机器学习中最具挑战性的任务之一。基于编码器-解码器框架的一般方法，我们基于对象，活动和场景的多个特征，提出了一种具有时空注意力的双向长期短期记忆网络，可以学习有价值的和互补的高级视觉表示，并且动态地关注视频不同子集中不同帧的最重要上下文信息。从实验结果来看，我们提出的方法在MSVD视频数据集上具有竞争性或优于最新技术。

著录项

来源
《2018 international conference on image and video Processing, and artificial intelligence》|2018年|1083616.1-1083616.8|共8页
会议地点 Shanghai(CN)
作者
Li Chu-yi; Yu Wei-yu;
展开▼
作者单位

School of Electronic and Information Engineering , South China University of Technology;

School of Electronic and Information Engineering , South China University of Technology;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
natural language descriptions; bidirectional long short-term memory network; spatial-temporal attention; multiple features;

机译：自然语言描述；双向长短期记忆网络；时空注意多种功能;

相似文献

外文文献
中文文献
专利

1. STAT: Spatial-Temporal Attention Mechanism for Video Captioning [J] . IEEE transactions on multimedia . 2020,第1期

机译：STAT：视频字幕的时空注意机制
2. Corrections to “STAT: Spatial-Temporal Attention Mechanism for Video Captioning” [Jan 20 229-241] [J] . IEEE transactions on multimedia . 2020,第3期

机译：对“ STAT：视频字幕的时空注意机制”的更正[Jan 20 229-241]
3. VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks [J] . Na Zhao, Hanwang Zhang, Richang Hong, IEEE transactions on multimedia . 2017,第9期

机译：VideoWhisper：通过基于注意力的递归神经网络实现区分性无监督视频特征学习
4. Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning [C] . Li Chu-yi, Yu Wei-yu International Conference on Image and Video Processing, and Artificial Intelligence . 2018

机译：基于视频字幕的多个功能的Bi-LSTM网络中的空间时间关注
5. A neural model of scene understanding: Multiple-scale spatial and feature-based attention in scene search, learning, and recognition. [D] . Huang, Tsung-Ren. 2010

机译：场景理解的神经模型：场景搜索，学习和识别中多尺度基于空间和基于特征的注意力。
6. ADST: Forecasting Metro Flow Using Attention-Based Deep Spatial-Temporal Networks with Multi-Task Learning [O] . Hongwei Jia, Haiyong Luo, Hao Wang, 2020

机译：ADST：使用基于注意的多任务学习的深空间网络预测地铁流量
7. Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map [O] . Boeun Kim, Saim Shin, Hyedong Jung 2019

机译：使用标题注意图的基于变化的自动统计器的多个图像标题

Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning

摘要

著录项

相似文献

相关主题

期刊订阅