...
首页> 外文期刊>Computer Vision, IET >Generating image descriptions with multidirectional 2D long short-term memory
【24h】

Generating image descriptions with multidirectional 2D long short-term memory

机译:使用多方向2D长短期记忆生成图像描述

获取原文
获取原文并翻译 | 示例
           

摘要

Connecting visual imagery with descriptive language is a challenge for computer vision and machine translation. To approach this problem, the authors propose a novel end-to-end model to generate descriptions for images. Some early works used convolutional neural network-long-short-term memory (CNN-LSTM) model to describe the image, where a CNN encodes the input image into feature vector and an LSTM decodes the feature vector into a description. Since two-dimensional LSTM (2DLSTM) has property of translation invariance and can encode the relationships between regions in an image, they not only apply a CNN to extract global features of an image, but also use a multidirectional 2DLSTM to encode the feature maps extracted by CNN into structural local features. Their model is trained through maximising the likelihood of the target description sentence from the training dataset. Experiments on two challenging datasets show the accuracy of the model and the fluency of the language which is learned by their model. They compare bilingual evaluation understudy score and retrieval metric of their results with current state-of-the-art scores and show the improvements on Flickr30k and MS COCO.
机译:将视觉图像与描述性语言相结合是计算机视觉和机器翻译的挑战。为了解决这个问题,作者提出了一种新颖的端到端模型来生成图像描述。一些早期的工作使用卷积神经网络长期短期记忆(CNN-LSTM)模型来描述图像,其中CNN将输入图像编码为特征向量,而LSTM将特征向量解码为描述。由于二维LSTM(2DLSTM)具有平移不变性并且可以对图像中区域之间的关系进行编码,因此它们不仅应用CNN提取图像的全局特征,而且使用多向2DLSTM编码提取的特征图通过CNN转化为结构性的局部特征。通过最大化训练数据集中目标描述语句的可能性来训练他们的模型。在两个具有挑战性的数据集上进行的实验表明,该模型的准确性以及通过其模型学习到的语言的流畅程度。他们将双语评估的学习不足分数和结果的检索指标与当前的最新分数进行比较,并展示了Flickr30k和MS COCO的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号