【24h】

Diverse Beam Search for Improved Description of Complex Scenes

机译:不同的光束搜索,以改进复杂场景的描述

获取原文

摘要

A single image captures the appearance and position of multiple entities in a scene as well as their complex interactions. As a consequence, natural language grounded in visual contexts tends to be diverse - with utterances differing as focus shifts to specific objects, interactions, or levels of detail. Recently, neural sequence models such as RNNs and LSTMs have been employed to produce visually-grounded language. Beam Search, the standard work-horse for decoding sequences from these models, is an approximate inference algorithm that decodes the top-B sequences in a greedy left-to-right fashion. In practice, the resulting sequences are often minor rewordings of a common utterance, failing to capture the multimodal nature of source images. To address this shortcoming, we propose Diverse Beam Search (DBS), a diversity promoting alternative to BS for approximate inference. DBS produces sequences that are significantly different from each other by incorporating diversity constraints within groups of candidate sequences during decoding; moreover, it achieves this with minimal computational or memory overhead. We demonstrate that our method improves both diversity and quality of decoded sequences over existing techniques on two visually-grounded language generation tasks - image captioning and visual question generation - particularly on complex scenes containing diverse visual content. We also show similar improvements at language-only machine translation tasks, highlighting the generality of our approach.
机译:单个图像捕获场景中多个实体的外观和位置以及它们的复杂交互。因此,在视觉上下文中接地的自然语言往往是多样的 - 与焦点转移到特定对象,相互作用或细节水平的话语不同。最近,已经采用了诸如RNN和LSTM的神经序列模型来产生视觉上接地的语言。光束搜索是用于解码这些模型的标准工作马,是一种近似推理算法,其以贪婪的左右方式解码顶部B序列。在实践中,所得到的序列通常是共同话语的小重写,未能捕获源图像的多模式性质。为了解决这种缺点,我们提出了多样化的光束搜索(DBS),多样性促进BS替代的近似推断。 DBS通过在解码期间掺入候选序列组内的分集限制来产生彼此显着不同的序列;此外,它以最小的计算或内存开销实现了这一点。我们展示了我们的方法在两个视觉上接地的语言生成任务 - 图像标题和视觉问题上提高了对现有技术的分类和质量 - 特别是在包含各种视觉内容的复杂场景上。我们还展示了语言的机器翻译任务类似的改进,突出了我们方法的一般性。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号