首页> 外文期刊>Big Data, IEEE Transactions on >AnswerNet: Learning to Answer Questions
【24h】

AnswerNet: Learning to Answer Questions

机译:答案:学习回答问题

获取原文
获取原文并翻译 | 示例
       

摘要

Multi-modal tasks like visual question answering (VQA) are an important step towards human-level artificial intelligence. In general, the input of the VQA task consists of an image and a related question. In order to correctly answer the question, a model needs to extract and integrate useful information from both the image and the question. In this paper, we propose a model named AnswerNet to tackle this task. In the proposed model, discriminative features are extracted from both the image and the question. Specifically, high-level image features are extracted by the state-of-the-art convolutional neural network, i.e., Deep Residual Net. For question features, the semantic representations of the question and the term frequencies of the distinct words are captured by long short-term memory network and bag-of-words model, respectively. Then, a hierarchical fusion network is proposed to effectively fuse the image features with the question features. Experimental results on three large-scale datasets, VQA, COCO-QA, and VQA2, demonstrate the effectiveness of the proposed AnswerNet.
机译:像视觉问题的多模态任务(VQA)是人为人工智能的重要一步。通常,VQA任务的输入包括图像和相关问题。为了正确回答问题,模型需要从图像和问题中提取和集成有用的信息。在本文中,我们提出了一个名为Aswernet的模型来解决这项任务。在所提出的模型中,从图像和问题中提取歧视特征。具体地,通过最先进的卷积神经网络,即深剩余网提取高级图像特征。对于问题特征,分别由长短期存储器网络和单词袋式模型捕获了问题的语义表示和不同词汇的初始频率。然后,提出了一种分层融合网络,以有效地熔断图像特征与问题特征。在三个大型数据集,VQA,Coco-QA和VQA2上的实验结果证明了所提出的答案的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号