Focal Visual-Text Attention for Visual Question Answering

机译：针对视觉问题的关注关注

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions from a large collection, a natural problem is to identify snippets to support the answer. In this paper, we describe a novel neural network called Focal Visual-Text Attention network (FVTA) for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented. FVTA introduces an end-to-end approach that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. FVTA can not only answer the questions well but also provides the justifications which the system results are based upon to get the answers. FVTA achieves state-of-the-art performance on the MemexQA dataset and competitive results on the MovieQA dataset.

机译：最近关于语言和愿景的洞察，已成功应用于简单的单一图像视觉问题应答。然而，为了解决现实生活问题，回答个人照片等多媒体集合问题，我们必须使用照片或视频的序列来看整个集合。在从大集合回答问题时，自然问题是识别片段以支持答案。在本文中，我们描述了一种名为焦点视觉文本关注网络（FVTA）的新型神经网络，用于在视觉问题应答中的集体推理，其中呈现了诸如图像和文本元数据的视觉和文本序列信息。 FVTA介绍了一种端到端的方法，它利用分层过程来动态地确定哪些媒体以及在顺序数据中关注哪些时间以回答问题。 FVTA不仅可以很好地回答问题，而且还提供了系统结果基于答案的理由。 FVTA在MemexQA数据集中实现最先进的性能，并在MovieQA DataSet上实现竞争结果。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|731p|共9页
会议地点
作者
Junwei Liang; Lu Jiang; Liangliang Cao; Li-Jia Li; Alexander Hauptmann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词
Visualization; Videos; Knowledge discovery; Computational modeling; Correlation; Cognition;

机译：可视化;视频;知识发现;计算建模;相关;认知;
入库时间 2022-08-20 20:14:51

相似文献

外文文献
中文文献
专利

1. Focal Visual-Text Attention for Memex Question Answering [J] . Liang Junwei, Jiang Lu, Cao Liangliang, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第8期

机译：Memex问题解答的焦点视觉文本注意
2. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
3. Question-Led object attention for visual question answering [J] . Gao Lianli, Cao Liangfu, Xu Xing, Neurocomputing . 2020,第May28期

机译：问题LED对象注意视觉问题应答
4. Focal Visual-Text Attention for Visual Question Answering [C] . Junwei Liang, Lu Jiang, Liangliang Cao, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：视觉问题解答的焦点视觉文本注意
5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Focal Visual-Text Attention for Visual Question Answering [O] . Junwei Liang, Lu Jiang, Liangliang Cao, 2018

机译：针对视觉问题的关注关注

Focal Visual-Text Attention for Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅