Encoder-Decoder Architecture for Image Caption Generation

机译：用于图像字幕生成的编码器-解码器体系结构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Describing the contents of an image without human intervention is a complex task. Computer Vision and Natural Language Processing are widely used for tackling this problem. It requires an approach with two distinct methods, to understand the contents of the image using computer vision, convert the understanding into semantically correct sentences. Convolutional Neural Network (CNN) is a widely used powerful image feature extraction algorithm for object detection and image classification. Gated Recurrent Unit (GRU) is typically used for effective sentence generation. A combined model of CNN and GRU was proposed to achieve accurate image captions. With the proposed model, an experimentation was done with various datasets and compared the results with existing work. BLEU evaluation metrics was used for benchmarking the results; The proposed model results in a BLEU-4 score (the higher the better) on the MS-COCO 2017 dataset as 53.5.

机译：在没有人为干预的情况下描述图像的内容是一项复杂的任务。计算机视觉和自然语言处理被广泛用于解决此问题。它需要一种采用两种不同方法的方法，即使用计算机视觉来理解图像的内容，并将其理解为语义正确的句子。卷积神经网络（CNN）是一种广泛使用的功能强大的图像特征提取算法，用于对象检测和图像分类。门控循环单元（GRU）通常用于有效的句子生成。提出了CNN和GRU的组合模型以实现准确的图像字幕。使用提出的模型，对各种数据集进行了实验，并将结果与现有工作进行了比较。 BLEU评估指标用于对结果进行基准测试;所提出的模型在MS-COCO 2017数据集上的BLEU-4得分（越高越好）为53.5。

著录项

来源
《International Conference on Communication System, Computing and IT Applications》|2020年|174-179|共6页
会议地点
作者
Harshit Parikh; Harsh Sawant; Bhautik Parmar; Rahul Shah; Santosh Chapaneri; Deepak Jayaswal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Artificial Intelligence; Computer Vision; Natural Language Processing;

机译：人工智能;计算机视觉;自然语言处理;

相似文献

外文文献
中文文献
专利

1. A Similarity Searching System for Biological Phenotype Images Using Deep Convolutional Encoder-decoder Architecture [J] . Bizhi Wu, Hangxiao Zhang, Limei Lin, Current Bioinformatics . 2019,第7期

机译：使用深卷积编码器 - 解码器架构的生物表型图像的相似性搜索系统
2. Building extraction from VHR remote sensing imagery by combining an improved deep convolutional encoder-decoder architecture and historical land use vector map [J] . Feng Wenqing, Sui Haigang, Hua Li, International journal of remote sensing . 2020,第17a18期

机译：通过组合改进的深卷积编码器 - 解码器架构和历史土地使用传染媒介地图，从VHR遥感图像提取从VHR遥感图像提取
3. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J] . Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla IEEE Transactions on Pattern Analysis and Machine Intelligence . 2017,第12期

机译：SegNet：用于图像分割的深度卷积编码器-解码器体系结构
4. Encoder-Decoder Architecture for Image Caption Generation [C] . Harshit Parikh, Harsh Sawant, Bhautik Parmar, International Conference on Communication System, Computing and IT Applications . 2020

机译：用于图像标题生成的编码器解码器架构
5. Generation of Humorous Caption for Cartoon Images Using Deep Learning [D] . Shanmuga Sundaram, Rajesh. 2018

机译：使用深度学习的卡通形象的幽默标题
6. Asymmetric Encoder-Decoder Structured FCN Based LiDAR to Color Image Generation [O] . Hyun-Koo Kim, Kook-Yeol Yoo, Ju H. Park, 2019

机译：基于非对称编解码器结构的基于FCN的LiDAR彩色图像生成
7. U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images [O] . S. M. Kamrul Hasan, Cristian A. Linte 2019

机译：U-NetPlus：用于腹腔镜图像的语义和实例分割的修改的编码器 - 解码器U-Net架构

Encoder-Decoder Architecture for Image Caption Generation

摘要

著录项

相似文献

相关主题

期刊订阅