【24h】

Neural Baby Talk

机译:神经婴儿谈话

获取原文

摘要

We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image. Our approach reconciles classical slot filling approaches (that are generally better grounded in images) with modern neural captioning approaches (that are generally more natural sounding and accurate). Our approach first generates a sentence 'template' with slot locations explicitly tied to specific image regions. These slots are then filled in by visual concepts identified in the regions by object detectors. The entire architecture (sentence template generation and slot filling with object detectors) is end-to-end differentiable. We verify the effectiveness of our proposed model on different image captioning tasks. On standard image captioning and novel object captioning, our model reaches state-of-the-art on both COCO and Flickr30k datasets. We also demonstrate that our model has unique advantages when the train and test distributions of scene compositions - and hence language priors of associated captions - are different. Code has been made available at: https://github.com/jiasenlu/NeuralBabyTalk.
机译:我们介绍了一种用于图像字幕的新颖框架,该框架可以产生自然语言,这些自然语言明确地植基于对象检测器在图像中找到的实体。我们的方法使经典的时隙填充方法(通常在图像中更好地扎根)与现代的神经字幕方法(通常听起来更自然,更准确)相协调。我们的方法首先生成一个句子“模板”,其插槽位置明确绑定到特定图像区域。然后通过物体检测器在区域中识别的视觉概念来填充这些插槽。整个架构(句子模板生成和使用对象检测器填充插槽)是端到端可区分的。我们验证了我们提出的模型在不同图像字幕任务上的有效性。在标准图像字幕和新颖的对象字幕上,我们的模型在COCO和Flickr30k数据集上均达到了最新水平。我们还证明,当场景构图的训练和测试分布以及相关字幕的语言先验不同时,我们的模型具有独特的优势。代码已在以下位置提供:https://github.com/jiasenlu/NeuralBabyTalk。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号