A Framework For Captioning The Human Interactions

机译：字幕人类互动的框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Caption generation is an emerging Artificial Intelligent challenge where a content description has resulted in a given input. Captioning involves the Computer Vision methodologies for the identification of content from input images and language modeling techniques for processing the text. The objective of Video Captioning is to generate a natural language sentence relevant to the content of the input video clips. In this paper, a deep learning-based encoder-decoder model has been used to result in effective video captions for human actions. The Caption Generative model takes video as input and generates a caption for the interactive actions performed by a human. This model comprises of two stages. The first stage (Encoder) performs extraction of the features using the Inception V3 model in Convolution Neural Network (CNN), and the second stage (Decoder) uses Long Short Term Memory (LSTM) a sequence modeling neural network is used for generating the captions. SBU Interaction dataset is used to evaluate the framework dealt in with this paper. Metrics such as accuracy, recall, precision, and F-score are measured to demonstrate the performance of the model. Bilingual Evaluation Understudy (BLEU) Score is also calculated for evaluating the generated captions.

机译：字幕生成是新兴的人工智能挑战，其中内容描述导致了给定的输入。字幕涉及用于从输入图像中识别内容的计算机视觉方法，以及用于处理文本的语言建模技术。视频字幕的目的是生成与输入视频剪辑的内容相关的自然语言句子。在本文中，基于深度学习的编码器-解码器模型已用于为人类行为提供有效的视频字幕。字幕生成模型将视频作为输入，并为人类执行的交互操作生成字幕。该模型包括两个阶段。第一阶段（编码器）使用卷积神经网络（CNN）中的Inception V3模型执行特征的提取，第二阶段（解码器）使用长短期记忆（LSTM）的序列建模神经网络用于生成字幕。 SBU Interaction数据集用于评估本文处理的框架。测量准确性，召回率，精度和F分数等指标以证明模型的性能。还将评估双语评估学习（BLEU）分数，以评估生成的字幕。

著录项

来源
《International Conference on Advanced Computing》|2019年|13-17|共5页
会议地点
作者
U Afreen Farzana; S. Abirami; M Srivani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
computer vision; convolutional neural nets; feature extraction; learning (artificial intelligence); natural language processing; recurrent neural nets; text analysis; video signal processing;

机译：计算机视觉;卷积神经网络;特征提取;学习（人工智能）;自然语言处理;递归神经网络;文本分析;视频信号处理;

相似文献

外文文献
中文文献
专利

1. Supervised framework for automatic recognition and retrieval of interaction: a framework for classification and retrieving videos with similar human interactions [J] . C. Chattopadhyay, S. Das Computer Vision, IET . 2016,第3期

机译：受监管的自动识别和检索交互的框架：用于分类和检索具有类似人类交互作用的视频的框架
2. Coupled Human and Natural Cube:A novel framework for analyzing the multiple interactions between humans and nature [J] . LIU Haimeng, FANG Chuanglin, FANG Kai 地理学报（英文版） . 2020,第003期

机译：耦合的人与自然立方体：分析人与自然之间多重相互作用的新颖框架
3. Understanding Human-Virus Protein-Protein Interactions Using a Human Protein Complex-Based Analysis Framework [J] . Shiping Yang, Chen Fu, Xianyi Lian, mSystems . 2019,第2期

机译：使用基于人类蛋白质复合物的分析框架了解人类病毒蛋白质与蛋白质的相互作用
4. A Framework For Captioning The Human Interactions [C] . U Afreen Farzana, S. Abirami, M Srivani International Conference on Advanced Computing . 2019

机译：标题为人类互动的框架
5. Modelling Human-Automation Interactions in a Haptic Shared Control Framework [D] . Yeravdekar, Arjun Vishnu. 2020

机译：在触觉共享控制框架中建模人自动化交互
6. RTEX: A novel framework for ranking tagging and explanatory diagnostic captioning of radiography exams [O] . Vasiliki Kougia, John Pavlopoulos, Panagiotis Papapetrou, 2021

机译：RTEX：射线照相考试的排名标记和解释性诊断标题的新框架
7. Coupled Human and Natural Cube: A novel framework for analyzing the multiple interactions between humans and nature [O] . Haimeng Liu, Chuanglin Fang, Kai Fang 2020

机译：耦合人和天然立方体：一种分析人与自然之间多重相互作用的新框架

A Framework For Captioning The Human Interactions

摘要

著录项

相似文献

相关主题

期刊订阅