首页> 外文会议>International Conference on Computer Vision >Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
【24h】

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

机译:顺序潜空间,用于在不同图像标题期间建模意图

获取原文

摘要

Diverse and accurate vision+language modeling is an important goal to retain creative freedom and maintain user engagement. However, adequately capturing the intricacies of diversity in language models is challenging. Recent works commonly resort to latent variable models augmented with more or less supervision from object detectors or part-of-speech tags. In common to all those methods is the fact that the latent variable either only initializes the sentence generation process or is identical across the steps of generation. Both methods offer no fine-grained control. To address this concern, we propose Seq-CVAE which learns a latent space for every word. We encourage this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future. We illustrate the efficacy of the proposed approach on the challenging MSCOCO dataset, significantly improving diversity metrics compared to baselines while performing on par w.r.t. sentence quality.
机译:不同和准确的视觉+语言建模是保留创造性自由并维护用户参与的重要目标。然而,充分捕获语言模型中多样性的复杂性具有挑战性。最近的作品常常采用潜在的变量模型来增强来自对象探测器或语音部分的更多或多或少的监督。所有这些方法都是潜在的变量只初始化句子生成过程,或者在生成步骤中初始化。两种方法都没有提供细粒度控制。为了解决这个问题,我们提出了SEQ-CVAE,从而为每个单词学习潜在的空间。我们鼓励这种时间潜在的空间来捕捉关于如何通过模拟总结未来的表示来完成句子的“意图”。我们说明了所提出的方法对挑战Mscoco数据集的功效,与基准相比,在PAR W.T.T上与基线相比,显着改善了多样性度量。句子质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号