【24h】

Encoder-Decoder Architecture for Image Caption Generation

机译:用于图像字幕生成的编码器-解码器体系结构

获取原文

摘要

Describing the contents of an image without human intervention is a complex task. Computer Vision and Natural Language Processing are widely used for tackling this problem. It requires an approach with two distinct methods, to understand the contents of the image using computer vision, convert the understanding into semantically correct sentences. Convolutional Neural Network (CNN) is a widely used powerful image feature extraction algorithm for object detection and image classification. Gated Recurrent Unit (GRU) is typically used for effective sentence generation. A combined model of CNN and GRU was proposed to achieve accurate image captions. With the proposed model, an experimentation was done with various datasets and compared the results with existing work. BLEU evaluation metrics was used for benchmarking the results; The proposed model results in a BLEU-4 score (the higher the better) on the MS-COCO 2017 dataset as 53.5.
机译:在没有人为干预的情况下描述图像的内容是一项复杂的任务。计算机视觉和自然语言处理被广泛用于解决此问题。它需要一种采用两种不同方法的方法,即使用计算机视觉来理解图像的内容,并将其理解为语义正确的句子。卷积神经网络(CNN)是一种广泛使用的功能强大的图像特征提取算法,用于对象检测和图像分类。门控循环单元(GRU)通常用于有效的句子生成。提出了CNN和GRU的组合模型以实现准确的图像字幕。使用提出的模型,对各种数据集进行了实验,并将结果与​​现有工作进行了比较。 BLEU评估指标用于对结果进行基准测试;所提出的模型在MS-COCO 2017数据集上的BLEU-4得分(越高越好)为53.5。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号