Focal Visual-Text Attention for Visual Question Answering

机译：视觉问题解答的焦点视觉文本注意

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions from a large collection, a natural problem is to identify snippets to support the answer. In this paper, we describe a novel neural network called Focal Visual-Text Attention network (FVTA) for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented. FVTA introduces an end-to-end approach that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. FVTA can not only answer the questions well but also provides the justifications which the system results are based upon to get the answers. FVTA achieves state-of-the-art performance on the MemexQA dataset and competitive results on the MovieQA dataset.

机译：使用神经网络对语言和视觉的最新见解已成功应用于简单的单图像视觉问题解答。但是，要解决现实生活中诸如个人照片之类的多媒体收藏品中的问题解答问题，我们必须查看带有照片或视频序列的整个收藏品。回答来自大量收藏的问题时，自然的问题是要确定片段以支持答案。在本文中，我们描述了一种新颖的神经网络，称为焦点视觉文本注意网络（FVTA），用于视觉问答中的集体推理，其中同时显示了视觉和文本序列信息，例如图像和文本元数据。 FVTA引入了一种端到端方法，该方法利用分层过程来动态确定顺序数据中要关注的媒体和时间，以回答问题。 FVTA不仅可以很好地回答问题，而且可以提供系统结果用来获得答案的依据。 FVTA在MemexQA数据集上实现了最先进的性能，在MovieQA数据集上实现了竞争性结果。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|6135-6143|共9页
会议地点 Salt Lake City(US)
作者
Junwei Liang; Lu Jiang; Liangliang Cao; Li-Jia Li; Alexander Hauptmann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Videos; Knowledge discovery; Computational modeling; Correlation; Cognition;

机译：可视化；影片；知识发现；计算建模；相关性认识;
入库时间 2022-08-26 14:35:28

相似文献

外文文献
中文文献
专利

1. Focal Visual-Text Attention for Memex Question Answering [J] . Liang Junwei, Jiang Lu, Cao Liangliang, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第8期

机译：Memex问题解答的焦点视觉文本注意
2. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
3. Question-Led object attention for visual question answering [J] . Gao Lianli, Cao Liangfu, Xu Xing, Neurocomputing . 2020,第May28期

机译：问题LED对象注意视觉问题应答
4. Focal Visual-Text Attention for Visual Question Answering [C] . Junwei Liang, Lu Jiang, Liangliang Cao, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：针对视觉问题的关注关注
5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Focal Visual-Text Attention for Visual Question Answering [O] . Junwei Liang, Lu Jiang, Liangliang Cao, 2018

机译：针对视觉问题的关注关注

Focal Visual-Text Attention for Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅