首页> 外文期刊>Computer vision and image understanding >Visual question answering: Datasets, algorithms, and future challenges
【24h】

Visual question answering: Datasets, algorithms, and future challenges

机译:视觉问题解答:数据集,算法和未来挑战

获取原文
获取原文并翻译 | 示例

摘要

Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images. Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.
机译:视觉问答(VQA)是计算机视觉和自然语言处理中的最新问题,已经引起了深度学习,计算机视觉和自然语言处理社区的广泛关注。在VQA中,算法需要回答有关图像的基于文本的问题。自2014年发布第一个VQA数据集以来,已经发布了其他数据集,并提出了许多算法。在本文中,我们从问题制定,现有数据集,评估指标和算法等方面严格审查了VQA的当前状态。特别是,我们讨论了当前数据集在正确训练和评估VQA算法方面的局限性。然后,我们详尽地回顾现有的VQA算法。最后,我们讨论了VQA和图像理解研究的未来可能方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号