首页> 外文会议>International Conference on Advanced Computing >A Framework For Captioning The Human Interactions
【24h】

A Framework For Captioning The Human Interactions

机译:字幕人类互动的框架

获取原文

摘要

Caption generation is an emerging Artificial Intelligent challenge where a content description has resulted in a given input. Captioning involves the Computer Vision methodologies for the identification of content from input images and language modeling techniques for processing the text. The objective of Video Captioning is to generate a natural language sentence relevant to the content of the input video clips. In this paper, a deep learning-based encoder-decoder model has been used to result in effective video captions for human actions. The Caption Generative model takes video as input and generates a caption for the interactive actions performed by a human. This model comprises of two stages. The first stage (Encoder) performs extraction of the features using the Inception V3 model in Convolution Neural Network (CNN), and the second stage (Decoder) uses Long Short Term Memory (LSTM) a sequence modeling neural network is used for generating the captions. SBU Interaction dataset is used to evaluate the framework dealt in with this paper. Metrics such as accuracy, recall, precision, and F-score are measured to demonstrate the performance of the model. Bilingual Evaluation Understudy (BLEU) Score is also calculated for evaluating the generated captions.
机译:字幕生成是新兴的人工智能挑战,其中内容描述导致了给定的输入。字幕涉及用于从输入图像中识别内容的计算机视觉方法,以及用于处理文本的语言建模技术。视频字幕的目的是生成与输入视频剪辑的内容相关的自然语言句子。在本文中,基于深度学习的编码器-解码器模型已用于为人类行为提供有效的视频字幕。字幕生成模型将视频作为输入,并为人类执行的交互操作生成字幕。该模型包括两个阶段。第一阶段(编码器)使用卷积神经网络(CNN)中的Inception V3模型执行特征的提取,第二阶段(解码器)使用长短期记忆(LSTM)的序列建模神经网络用于生成字幕。 SBU Interaction数据集用于评估本文处理的框架。测量准确性,召回率,精度和F分数等指标以证明模型的性能。还将评估双语评估学习(BLEU)分数,以评估生成的字幕。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号