Holistic Multi-Modal Memory Network for Movie Question Answering

Wang Anran; Anh Tuan Luu; Foo Chuan-Sheng; Zhu Hongyuan; Tay Yi; Chandrasekhar Vijay

首页> 外文期刊>IEEE Transactions on Image Processing >Holistic Multi-Modal Memory Network for Movie Question Answering

【24h】

Holistic Multi-Modal Memory Network for Movie Question Answering

机译：用于电影问题的整体多模态内存网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Answering questions using multi-modal context is a challenging problem, as it requires a deep integration of diverse data sources. Existing approaches only consider a subset of all possible interactions among data sources during one attention hop. In this paper, we present a holistic multi-modal memory network (HMMN) framework that fully considers interactions between different input sources (multi-modal context and question) at each hop. In addition, to hone in on relevant information, our framework takes answer choices into consideration during the context retrieval stage. Our HMMN framework effectively integrates information from the multi-modal context, question, and answer choices, enabling more informative context to be retrieved for question answering. Experimental results on the MovieQA and TVQA datasets validate the effectiveness of our HMMN framework. Extensive ablation studies show the importance of holistic reasoning and reveal the contributions of different attention strategies to model performance.

机译：使用多模态上下文的回答问题是一个具有挑战性的问题，因为它需要深入集成不同的数据源。现有方法仅在一个注意力期间考虑数据源之间所有可能的交互的子集。在本文中，我们介绍了一个全面的多模态存储器网络（HMMN）框架，可以在每跳时完全考虑不同输入源（多模态上下文和问题）之间的交互。此外，在上下文检索阶段，我们的框架在相关信息中考虑了应答选择。我们的HMMN框架有效地将信息与多模态上下文，问题和应答选项集成在一起，使得能够为问题接听来检索更多信息上下文。电影院和TVQA数据集上的实验结果验证了嗯框架的有效性。广泛的消融研究表明了整体推理的重要性，并揭示了不同关注策略对模型性能的贡献。

著录项

来源
《IEEE Transactions on Image Processing》 |2020年第2020期|489-499|共11页
作者
Wang Anran; Anh Tuan Luu; Foo Chuan-Sheng; Zhu Hongyuan; Tay Yi; Chandrasekhar Vijay;
展开▼
作者单位

ASTAR Inst Infocomm Res Singapore 138632 Singapore;

ASTAR Inst Infocomm Res Singapore 138632 Singapore;

ASTAR Inst Infocomm Res Singapore 138632 Singapore;

ASTAR Inst Infocomm Res Singapore 138632 Singapore;

Nanyang Technol Univ Sch Comp Sci & Engn Singapore 639798 Singapore;

ASTAR Inst Infocomm Res Singapore 138632 Singapore;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Knowledge discovery; Visualization; Videos; Hidden Markov models; Task analysis; Motion pictures; Semantics; Question answering; multi-modal learning; MovieQA;

机译：知识发现;可视化;视频;隐藏的马尔可夫模型;任务分析;动态图片;语义;问题回答;多模态学习;电影QA;

相似文献

外文文献
中文文献
专利

1. Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks [J] . Zhao Zhou, Xiao Shuwen, Song Zehan, IEEE Transactions on Image Processing . 2020,第期

机译：通过多模态条件对冲网络应答的开放式视频问题
2. Movie Question Answering via Textual Memory and Plot Graph [J] . Han Yahong, Wang Bo, Hong Richang, IEEE Transactions on Circuits and Systems for Video Technology . 2020,第3期

机译：通过文本内存和绘图图回答的电影问题
3. Enhanced question understanding with dynamic memory networks for textual question answering [J] . Yue Chunyi, Cao Hanqiang, Xiong Kun, Expert Systems with Application . 2017,第SEPa期

机译：动态内存网络增强了对问题的理解，可用于文本问题解答
4. Answer-checking in Context: A Multi-modal Fully Attention Network for Visual Question Answering [C] . Hantao Huang, Tao Han, Wei Han, International Conference on Pattern Recognition . 2021

机译：在上下文中回答 - 检查：用于视觉问题的多模态完全注意网络
5. Inferring answer quality, answerer expertise, and ranking in question answer social networks. [D] . Cai, Yuanzhe. 2014

机译：推断回答质量，回答者专业知识以及对问题进行回答的社交网络的排名。
6. Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering [O] . Zihan Guo, Dezhi Han 2020

机译：用于视觉问题的多模态显式稀疏关注网络
7. Holistic Multi-Modal Memory Network for Movie Question Answering [O] . Anran Wang, Anh Tuan Luu, Chuan-Sheng Foo, 2020

机译：用于电影问题的整体多模态内存网络

Holistic Multi-Modal Memory Network for Movie Question Answering

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅