Generating video description with Long-Short Term Memory

机译：使用长期记忆生成视频描述

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Connecting visual imagery with visual descriptive language is a challenge for computer vision and machine translation. Inspired by image description, which used `encoder-decoder' model to translate image into target sentence. We propose an approach that can generate descriptions for video. Different from image which record the information in a moment, video have time-serials property. So when generating video description, we requires encoding dynamic temporal structure. Our model in this paper successfully takes into account both the global and local information. First, our approach extract the features of sample frames by a Convolutional Neural Network (CNN) which is pre-trained for image classification. Second, we get the global feature of video by max pooling the features of frames. Third, we divide the Long-Short Term Memory (LSTM) into two parts, one of which encode the features of frames into local feature, another decode the features which contains global and local information into target sentence. Finally, we compare two variants of our model with recent works using BLEU metrics on YouTube dataset.

机译：将视觉图像与视觉描述性语言联系起来对于计算机视觉和机器翻译是一个挑战。受图像描述的启发，图像描述使用“编码器-解码器”模型将图像转换为目标句子。我们提出了一种可以生成视频描述的方法。视频不同于即时记录信息的图像，它具有时间序列属性。因此，在生成视频描述时，我们需要对动态时间结构进行编码。本文中的模型成功地考虑了全球和本地信息。首先，我们的方法通过预先训练用于图像分类的卷积神经网络（CNN）提取样本帧的特征。其次，我们通过最大程度地合并帧的特征来获得视频的全局特征。第三，我们将长期记忆（LSTM）分为两部分，其中一部分将帧的特征编码为局部特征，另一部分将包含全局信息和局部信息的特征解码为目标句子。最后，我们将模型的两个变体与使用YouTube数据集上的BLEU指标的最新作品进行比较。

著录项

来源
《International Conference on Image, Vision and Computing》|2016年|73-78|共6页
会议地点
作者
Shuohao Li; Jun Zhang; Qiang Guo; Jun Lei; Dan Tu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Legged locomotion; Motorcycles; Measurement; Barium; Decoding; Sports equipment; Chaos;

机译：腿运动;摩托车;测量;钡;解码;运动器材;混乱;

相似文献

外文文献
中文文献
专利

1. A Process-Aware Memory Compact-Device Model Using Long-Short Term Memory [J] . Albert S. Lin, Sparsh Pratik, Jun Ota, Quality Control, Transactions . 2021,第1期

机译：使用长短期内存的过程感知内存紧凑型器件模型
2. Multi Long-Short Term Memory Models for Short Term Traffic Flow Prediction [J] . Zelong XUE, Yang XUE IEICE transactions on information and systems . 2018,第12期

机译：用于短期交通流量预测的多长期短期记忆模型
3. Generating image descriptions with multidirectional 2D long short-term memory [J] . Shuohao Li, Jun Zhang, Qiang Guo, Computer Vision, IET . 2017,第1期

机译：使用多方向2D长短期记忆生成图像描述
4. Generating video description with Long-Short Term Memory [C] . Shuohao Li, Jun Zhang, Qiang Guo, International Conference on Image, Vision and Computing . 2016

机译：生成带有长期内存的视频描述
5. Quantitative Trading Portfolio Optimization-Based Stock Prediction Using Long-Short Term Memory Network [D] . Hao, Ruizhi. 2021

机译：基于量化的贸易组合优化使用长短期内存网络的库存预测
6. Forecasting stock prices with long-short term memory neural network based on attention mechanism [O] . Jiayu Qiu, Bin Wang, Changjun Zhou 2020

机译：基于注意机制的长短期内存神经网络预测股票价格
7. Bidirectional Long-Short Term Memory for Video Description [O] . Bin, Yi, Yang, Yang, Huang, Zi, 2016

机译：视频描述的双向长短期记忆
8. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild. [R] . Thomason, J., Venugopalan, S., Guadarrama, S., 2014

机译：整合语言和视觉，生成自然语言对野外视频的描述。

Generating video description with Long-Short Term Memory

摘要

著录项

相似文献

相关主题

期刊订阅