Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning

机译：基于视频字幕的多个功能的Bi-LSTM网络中的空间时间关注

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatically generating rich natural language descriptions for open-domain videos is among the most challenging tasks of computer vision, natural language processing and machine learning. Based on the general approach of encoder-decoder frameworks, we propose a bidirectional long short-term memory network with spatial-temporal attention based on multiple features of objects, activities and scenes, which can learn valuable and complementary high-level visual representations, and dynamically focus on the most important context information of diverse frames within different subsets of videos. From the experimental results, our proposed methods achieve competitive or better than state-of-the-art performance on the MSVD video dataset.

机译：自动为开放式视频生成丰富的自然语言描述是计算机视觉，自然语言处理和机器学习中最具挑战性的任务之一。基于编码器 - 解码器框架的一般方法，我们提出了一种双向短期内存网络，基于对象，活动和场景的多个特征，可以学习有价值和互补的高级视觉表示，以及动态地关注在不同子集中不同帧的最重要的上下文信息。从实验结果中，我们所提出的方法在MSVD视频数据集上实现竞争或优于最先进的性能。

著录项

来源
《International Conference on Image and Video Processing, and Artificial Intelligence》|2018年|1 v. (various pagings)|共8页
会议地点
作者
Li Chu-yi; Yu Wei-yu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 N-532;
关键词
natural language descriptions; bidirectional long short-term memory network; spatial-temporal attention; multiple features;

机译：自然语言描述;双向长期内记忆网络;空间关注;多个功能;

相似文献

外文文献
中文文献
专利

1. STAT: Spatial-Temporal Attention Mechanism for Video Captioning [J] . IEEE transactions on multimedia . 2020,第1期

机译：STAT：视频字幕的时空注意机制
2. Corrections to “STAT: Spatial-Temporal Attention Mechanism for Video Captioning” [Jan 20 229-241] [J] . IEEE transactions on multimedia . 2020,第3期

机译：对“ STAT：视频字幕的时空注意机制”的更正[Jan 20 229-241]
3. VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks [J] . Na Zhao, Hanwang Zhang, Richang Hong, IEEE transactions on multimedia . 2017,第9期

机译：VideoWhisper：通过基于注意力的递归神经网络实现区分性无监督视频特征学习
4. Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning [C] . Li Chu-yi, Yu Wei-yu 2018 international conference on image and video Processing, and artificial intelligence . 2018

机译：基于多种视频字幕功能的Bi-LSTM网络中的时空注意
5. A neural model of scene understanding: Multiple-scale spatial and feature-based attention in scene search, learning, and recognition. [D] . Huang, Tsung-Ren. 2010

机译：场景理解的神经模型：场景搜索，学习和识别中多尺度基于空间和基于特征的注意力。
6. ADST: Forecasting Metro Flow Using Attention-Based Deep Spatial-Temporal Networks with Multi-Task Learning [O] . Hongwei Jia, Haiyong Luo, Hao Wang, 2020

机译：ADST：使用基于注意的多任务学习的深空间网络预测地铁流量
7. Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map [O] . Boeun Kim, Saim Shin, Hyedong Jung 2019

机译：使用标题注意图的基于变化的自动统计器的多个图像标题

Spatial-Temporal Attention in Bi-LSTM Networks based on Multiple Features for Video Captioning

摘要

著录项

相似文献

相关主题

期刊订阅