首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Language Models for Image Captioning: The Quirks and What Works
【24h】

Language Models for Image Captioning: The Quirks and What Works

机译:图像标题的语言模型:怪癖和什么工作

获取原文

摘要

Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.
机译:最近的两种方法已经实现了最先进的图像标题。首先使用流水线过程,其中由在图像上训练的卷积神经网络(CNN)生成一组候选词,然后使用最大熵(ME)语言模型来将这些单词安排到连贯的句子中。第二个使用CNN的倒数第二激活层作为输入到经常性神经网络(RNN),然后生成标题序列。在本文中,我们通过使用与输入相同的最新CNN相同的最新CNN来比较这些不同语言建模方法的优点。我们在不同方法中检查问题,包括语言不规则性,标题重复和数据集重叠。通过组合ME和RNN方法的关键方面,我们通过先前发布的基准COCO数据集实现了新的记录性能。然而,我们在Bleu中看到的收益不会转化为人类判断。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号