Generating video description with Long-Short Term Memory

机译：生成带有长期内存的视频描述

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Connecting visual imagery with visual descriptive language is a challenge for computer vision and machine translation. Inspired by image description, which used `encoder-decoder' model to translate image into target sentence. We propose an approach that can generate descriptions for video. Different from image which record the information in a moment, video have time-serials property. So when generating video description, we requires encoding dynamic temporal structure. Our model in this paper successfully takes into account both the global and local information. First, our approach extract the features of sample frames by a Convolutional Neural Network (CNN) which is pre-trained for image classification. Second, we get the global feature of video by max pooling the features of frames. Third, we divide the Long-Short Term Memory (LSTM) into two parts, one of which encode the features of frames into local feature, another decode the features which contains global and local information into target sentence. Finally, we compare two variants of our model with recent works using BLEU metrics on YouTube dataset.

机译：使用视觉描述性语言连接可视图像是计算机视觉和机器翻译的挑战。灵感来自图像描述，它使用了“编码器 - 解码器”模型将图像转换为目标句子。我们提出一种可以生成视频描述的方法。与图像中的图像不同，视频具有时间序列性的属性。因此，在生成视频描述时，我们需要编码动态时间结构。我们本文的模型成功考虑了全局和本地信息。首先，我们的方法通过卷积神经网络（CNN）提取样品帧的特征，该卷积神经网络（CNN）是预先培训的图像分类。其次，通过Max汇集帧的功能，我们获得视频的全局特征。三，我们将长短短期内存（LSTM）划分为两个部分，其中一个将帧的功能编码为本地特征，另一个将包含全局和本地信息的特征解码为目标句子。最后，我们使用YouTube数据集上的Bleu指标比较我们模型的两个变体。

著录项

来源
《International Conference on Image, Vision and Computing》|2016年|145p|共6页
会议地点
作者
Shuohao Li; Jun Zhang; Qiang Guo; Jun Lei; Dan Tu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.4-53;
关键词
Legged locomotion; Motorcycles; Measurement; Barium; Decoding; Sports equipment; Chaos;

机译：腿运动;摩托车;测量;钡;解码;运动器材;混乱;

相似文献

外文文献
中文文献
专利

1. A Process-Aware Memory Compact-Device Model Using Long-Short Term Memory [J] . Albert S. Lin, Sparsh Pratik, Jun Ota, Quality Control, Transactions . 2021,第1期

机译：使用长短期内存的过程感知内存紧凑型器件模型
2. Multi Long-Short Term Memory Models for Short Term Traffic Flow Prediction [J] . Zelong XUE, Yang XUE IEICE transactions on information and systems . 2018,第12期

机译：用于短期交通流量预测的多长期短期记忆模型
3. Generating image descriptions with multidirectional 2D long short-term memory [J] . Shuohao Li, Jun Zhang, Qiang Guo, Computer Vision, IET . 2017,第1期

机译：使用多方向2D长短期记忆生成图像描述
4. Generating video description with Long-Short Term Memory [C] . Shuohao Li, Jun Zhang, Qiang Guo, International Conference on Image, Vision and Computing . 2016

机译：使用长期记忆生成视频描述
5. Quantitative Trading Portfolio Optimization-Based Stock Prediction Using Long-Short Term Memory Network [D] . Hao, Ruizhi. 2021

机译：基于量化的贸易组合优化使用长短期内存网络的库存预测
6. Forecasting stock prices with long-short term memory neural network based on attention mechanism [O] . Jiayu Qiu, Bin Wang, Changjun Zhou 2020

机译：基于注意机制的长短期内存神经网络预测股票价格
7. Bidirectional Long-Short Term Memory for Video Description [O] . Bin, Yi, Yang, Yang, Huang, Zi, 2016

机译：视频描述的双向长短期记忆
8. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild. [R] . Thomason, J., Venugopalan, S., Guadarrama, S., 2014

机译：整合语言和视觉，生成自然语言对野外视频的描述。

Generating video description with Long-Short Term Memory

摘要

著录项

相似文献

相关主题

期刊订阅