Automatic Video Captioning via Multi-channel Sequential Encoding

机译：通过多通道顺序编码自动视频字幕

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a novel two-stage video captioning framework composed of (1) a multi-channel video encoder and (2) a sentence-generating language decoder. Both of the encoder and decoder are based on recurrent neural networks with long-short-term-memory cells. Our system can take videos of arbitrary lengths as input. Compared with the previous sequence-to-sequence video captioning frameworks, the proposed model is able to handle multiple channels of video representations and jointly learn how to combine them. The proposed model is evaluated on two large-scale movie datasets (MPII Corpus and Montreal Video Description) and one YouTube dataset (Microsoft Video Description Corpus) and achieves the state-of-the-art performances. Furthermore, we extend the proposed model towards automatic American Sign Language recognition. To evaluate the performance of our model on this novel application, a new dataset for ASL video description is collected based on YouTube videos. Results on this dataset indicate that the proposed framework on ASL recognition is promising and will significantly benefit the independent communication between ASL users and others.

机译：在本文中，我们提出了一种由（1）多通道视频编码器和（2）句子生成语言解码器组成的新型两级视频标题框架。两个编码器和解码器都基于具有长短期存储器单元的经常性神经网络。我们的系统可以将任意长度的视频作为输入。与先前的序列到序列视频字幕框架相比，所提出的模型能够处理多个视频表示通道，并共同学习如何组合它们。拟议的模型是在两个大型电影数据集（MPII语料库和蒙特利尔视频描述）和一个YouTube数据集（Microsoft视频描述语料库）上进行评估，并实现了最先进的表演。此外，我们将拟议模型扩展到自动美国手语识别。为了评估我们模型在这部小型应用程序上的性能，基于YouTube视频收集ASL视频描述的新数据集。结果在此数据集上表明，拟议的ASL识别框架是有前途的，并将大大利用ASL用户与其他人之间的独立通信。

著录项

来源
《European Conference on Computer Vision》|2016年|922p|共16页
会议地点
作者
Chenyang Zhang; Yingli Tian;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词
Video captioning; Long-short-term-memory; Sequential encoding; American Sign Language;

机译：视频标题;长期记忆;顺序编码;美国手语;
入库时间 2022-08-20 20:08:26

相似文献

外文文献
中文文献
专利

1. YouTube automatic captions now on 1bn videos [J] . Andy McDonald Digital TV Europe . 2017,第330期

机译：YouTube现在在1BN视频上的自动标题
2. A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions [J] . Mate Akos Tuendik, Balazs Tarjan, Gyoergy Szaszak Computer speech and language . 2020,第Sepa期

机译：低延迟顺序模型及其以用户为中心的ASR隐藏标题的自动标点评估
3. An Automatic Video Reinforcing System for TV Programs using Semantic Metadata from Closed Captions [J] . Yuanyuan Wang, Daisuke Kitayama, Yukiko Kawai, International journal of multimedia data engineering & management . 2016,第1期

机译：使用隐藏式字幕的语义元数据的电视节目自动视频增强系统
4. Automatic Video Captioning via Multi-channel Sequential Encoding [C] . Chenyang Zhang, Yingli Tian European conference on computer vision . 2016

机译：通过多通道顺序编码进行自动视频字幕
5. Automatic Video Captioning using Deep Neural Network. [D] . Nguyen, Thang Huy. 2017

机译：使用深度神经网络的自动视频字幕。
6. Evaluation of automatic video captioning using direct assessment [O] . Yvette Graham, George Awad, Alan Smeaton 2012

机译：使用直接评估来评估自动视频字幕
7. Automatic caption localization in videos using salient points [O] . M. Bertini, C. Colombo, A. Del Bimbo, 2001

机译：使用显着点在视频中自动标题本地化

Automatic Video Captioning via Multi-channel Sequential Encoding

摘要

著录项

相似文献

相关主题

期刊订阅