首页> 外文期刊>IEEE Transactions on Circuits and Systems for Video Technology >Movie Question Answering via Textual Memory and Plot Graph
【24h】

Movie Question Answering via Textual Memory and Plot Graph

机译:通过文本内存和绘图图回答的电影问题

获取原文
获取原文并翻译 | 示例

摘要

Movies provide us with a mass of visual content as well as attracting stories. Existing methods have illustrated that understanding movie stories through only visual content is still a hard problem. In this paper, for answering questions about movies, we introduce a new dataset called PlotGraphs, as external knowledge. The dataset contains massive graph-based information of movies. In addition, we put forward a model that can utilize movie clip, subtitle, and graph-based external knowledge. The model contains two main parts: a layered memory network (LMN) and a plot graph representation network (PGRN). In particular, the LMN can represent frame-level and clip-level movie content by the fixed word memory module and the adaptive subtitle memory module, respectively. And the plot graph representation network can represent the entire graph. We first extract words and sentences from the training movie subtitles and then the hierarchically formed movie representations, which are learned from LMN. At the same time, the PGRN can represent the semantic information and the relationships in the graph. We conduct extensive experiments on the MovieQA dataset and the PlotGraphs dataset. With only visual content as inputs, the LMN with frame-level representation obtains a large performance improvement. When incorporating subtitles into LMN to form the clip-level representation, we achieve the state-of-the-art performance on the online evaluation task of "Video+Subtitles." After the integration of external knowledge, the performance of the model consisting of the LMN and the PGRN is further improved. The good performance successfully demonstrates that the external knowledge and the proposed model are effective for movie understanding.
机译:电影为我们提供了大量的视觉内容以及吸引故事。现有方法已经说明了通过仅通过视觉内容了解电影故事仍然是一个难题。在本文中,为了回答有关电影的问题,我们介绍了一个名为绘图曲线图的新数据集,作为外部知识。数据集包含电影的大规模图形信息。此外,我们提出了一种可以利用电影剪辑,字幕和基于图形的外部知识的模型。该模型包含两个主要部分:分层内存网络(LMN)和绘图图表示网络(PGRN)。特别地,LMN可以分别代表固定字存储模块和自适应字幕存储器模块代表帧级和剪辑级电影内容。并且绘图图表示网络可以表示整个图形。我们首先从训练电影字幕中提取单词和句子,然后从LMN中学到的分层形成的电影表示。同时,PGRN可以代表图中的语义信息和关系。我们对MovieQA数据集和绘图型数据集进行了广泛的实验。只有仅视觉内容作为输入,具有帧级表示的LMN获得了大的性能改进。将字幕与LMN结合到LMN以形成剪辑级别表示时,我们在“视频+字幕”的在线评估任务上实现了最先进的性能。在整合外部知识之后,进一步提高了由LMN和PGRN组成的模型的性能。良好的性能成功地表明外部知识和所提出的模型对于电影理解是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号