首页> 外文会议>European Conference on Computer Vision >Automatic Video Captioning via Multi-channel Sequential Encoding
【24h】

Automatic Video Captioning via Multi-channel Sequential Encoding

机译:通过多通道顺序编码自动视频字幕

获取原文

摘要

In this paper, we propose a novel two-stage video captioning framework composed of (1) a multi-channel video encoder and (2) a sentence-generating language decoder. Both of the encoder and decoder are based on recurrent neural networks with long-short-term-memory cells. Our system can take videos of arbitrary lengths as input. Compared with the previous sequence-to-sequence video captioning frameworks, the proposed model is able to handle multiple channels of video representations and jointly learn how to combine them. The proposed model is evaluated on two large-scale movie datasets (MPII Corpus and Montreal Video Description) and one YouTube dataset (Microsoft Video Description Corpus) and achieves the state-of-the-art performances. Furthermore, we extend the proposed model towards automatic American Sign Language recognition. To evaluate the performance of our model on this novel application, a new dataset for ASL video description is collected based on YouTube videos. Results on this dataset indicate that the proposed framework on ASL recognition is promising and will significantly benefit the independent communication between ASL users and others.
机译:在本文中,我们提出了一种由(1)多通道视频编码器和(2)句子生成语言解码器组成的新型两级视频标题框架。两个编码器和解码器都基于具有长短期存储器单元的经常性神经网络。我们的系统可以将任意长度的视频作为输入。与先前的序列到序列视频字幕框架相比,所提出的模型能够处理多个视频表示通道,并共同学习如何组合它们。拟议的模型是在两个大型电影数据集(MPII语料库和蒙特利尔视频描述)和一个YouTube数据集(Microsoft视频描述语料库)上进行评估,并实现了最先进的表演。此外,我们将拟议模型扩展到自动美国手语识别。为了评估我们模型在这部小型应用程序上的性能,基于YouTube视频收集ASL视频描述的新数据集。结果在此数据集上表明,拟议的ASL识别框架是有前途的,并将大大利用ASL用户与其他人之间的独立通信。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号