首页> 外文期刊>IEEE Transactions on Image Processing >Holistic Multi-Modal Memory Network for Movie Question Answering
【24h】

Holistic Multi-Modal Memory Network for Movie Question Answering

机译:用于电影问题的整体多模态内存网络

获取原文
获取原文并翻译 | 示例

摘要

Answering questions using multi-modal context is a challenging problem, as it requires a deep integration of diverse data sources. Existing approaches only consider a subset of all possible interactions among data sources during one attention hop. In this paper, we present a holistic multi-modal memory network (HMMN) framework that fully considers interactions between different input sources (multi-modal context and question) at each hop. In addition, to hone in on relevant information, our framework takes answer choices into consideration during the context retrieval stage. Our HMMN framework effectively integrates information from the multi-modal context, question, and answer choices, enabling more informative context to be retrieved for question answering. Experimental results on the MovieQA and TVQA datasets validate the effectiveness of our HMMN framework. Extensive ablation studies show the importance of holistic reasoning and reveal the contributions of different attention strategies to model performance.
机译:使用多模态上下文的回答问题是一个具有挑战性的问题,因为它需要深入集成不同的数据源。现有方法仅在一个注意力期间考虑数据源之间所有可能的交互的子集。在本文中,我们介绍了一个全面的多模态存储器网络(HMMN)框架,可以在每跳时完全考虑不同输入源(多模态上下文和问题)之间的交互。此外,在上下文检索阶段,我们的框架在相关信息中考虑了应答选择。我们的HMMN框架有效地将信息与多模态上下文,问题和应答选项集成在一起,使得能够为问题接听来检索更多信息上下文。电影院和TVQA数据集上的实验结果验证了嗯框架的有效性。广泛的消融研究表明了整体推理的重要性,并揭示了不同关注策略对模型性能的贡献。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号