Language Models for Image Captioning: The Quirks and What Works

机译：图像标题的语言模型：怪癖和什么工作

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.

机译：最近的两种方法已经实现了最先进的图像标题。首先使用流水线过程，其中由在图像上训练的卷积神经网络（CNN）生成一组候选词，然后使用最大熵（ME）语言模型来将这些单词安排到连贯的句子中。第二个使用CNN的倒数第二激活层作为输入到经常性神经网络（RNN），然后生成标题序列。在本文中，我们通过使用与输入相同的最新CNN相同的最新CNN来比较这些不同语言建模方法的优点。我们在不同方法中检查问题，包括语言不规则性，标题重复和数据集重叠。通过组合ME和RNN方法的关键方面，我们通过先前发布的基准COCO数据集实现了新的记录性能。然而，我们在Bleu中看到的收益不会转化为人类判断。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2015年||共6页
会议地点
作者
Jacob Devlin; Hao Cheng; Hao Fang; Saurabh Gupta; Li Deng; Xiaodong He; Geoffrey Zweig; Margaret Mitchell;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. c-RNN: A Fine-Grained Language Model for Image Captioning [J] . Huang Gengshi, Hu Haifeng Neural processing letters . 2019,第2期

机译：c-RNN：用于图像字幕的细粒度语言模型
2. A neural image captioning model with caption-to-images semantic constructor [J] . Su Jinsong, Tang Jialong, Lu Ziyao, Neurocomputing . 2019,第Nova20期

机译：具有字幕到图像语义构造函数的神经图像字幕模型
3. A natural language system for retrieval of captioned images [J] . David Elworthy, Tony Rose, Amanda Clare Journal of Linguistics . 2001,第2期

机译：用于检索字幕图像的自然语言系统
4. Language Models for Image Captioning: The Quirks and What Works [C] . Jacob Devlin, Hao Cheng, Hao Fang, Annual meeting of the Association for Computational Linguistics;International joint conference on natural language processing of the Asian Federation of Natural Languages processing . 2015

机译：图像字幕的语言模型：怪癖和有效方法
5. Ensemble Learning on Deep Neural Networks for Image Caption Generation [D] . Katpally, Harshitha 2019

机译：在深度神经网络上进行集成学习以生成图像字幕
6. Determination of Differences in Seed-Based Resting State Functional Magnetic Resonance Imaging Language Networks in Pediatric Patients with Left- and Right-Lateralized Language: A Pilot Study [O] . Audrey Nath, Meghan Robinson, John Magnotti, 2019

机译：左右侧语言的小儿患者基于种子的静止状态功能磁共振成像语言网络差异的确定：一项初步研究
7. Language Models for Image Captioning: The Quirks and What Works [O] . Jacob Devlinf, Hao Cheng, Hao Fang, 2016

机译：图像字幕的语言模型：怪癖和什么有用

Language Models for Image Captioning: The Quirks and What Works

摘要

著录项

相似文献

相关主题

期刊订阅