首页> 外文会议>IEEE International Conference on Multimedia and Expo >Multimodal Image Captioning Through Combining Reinforced Cross Entropy Loss and Stochastic Deprecation
【24h】

Multimodal Image Captioning Through Combining Reinforced Cross Entropy Loss and Stochastic Deprecation

机译:结合交叉交叉熵损失和随机弃用的多模态图像字幕

获取原文

摘要

Recently, Cross Entropy Loss (CEL) has been proved to be useful in encoder-decoder based multimodal image captioning; however, it still faces the difficulty of inconsistency between optimizing function and evaluation metrics. In this paper, we propose a new approach for multimodal image captioning. It consists of 1) Reinforced Cross Entropy Loss (RCEL) to maximize the probability of ground truth captions and optimize evaluation metrics directly, and 2) Stochastic Deprecation (SD) to automatically select high-quality ground truth sentences without losing the diversity of corpus. The proposed RCEL and SD are generic and can improve the existing natural language generation models while combining them (RCEL-SD) can achieve the best result. Experimental results on the benchmark MSCOCO dataset show that the proposed RCEL-SD respectively outperforms CEL in terms of all the 7 evaluation metrics on three recent image captioning models.
机译:最近,已证明交叉熵损失(CEL)可用于基于编码器 - 解码器的多模式图像标题;但是,它仍然面临优化功能和评估度量之间不一致的难度。在本文中,我们提出了一种新的多模式图像标题方法。它由1)加强跨熵损失(RCEL)直接最大化地面真理标题和优化评估度量的概率,以及2)随机弃用(SD)自动选择高质量的地面真理句而不会失去语料库的多样性。所提出的RCEL和SD是通用的,可以在组合它们(RCE-SD)时改进现有的自然语言生成模型可以达到最佳结果。基准Mscoco数据集上的实验结果表明,所提出的RCE-SD分别在三个近期图像标题模型上的所有7个评估度量方面优于Celforms。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号