首页> 外文会议>IEEE International Conference on Image Processing >Towards Mathematical Reasoning: A Multimodal Deep Learning Approach
【24h】

Towards Mathematical Reasoning: A Multimodal Deep Learning Approach

机译:迈向数学推理:一种多模式深度学习方法

获取原文

摘要

This paper presents a new direction for the visual question answering task. Given an image with a simple linear algebraic equation system and a question in natural language based on the variables in the equations, we propose an end-to-end deep learning model that produces accurate answers to questions pertaining to the value of the variables and other related questions. Modeling the problem of solving simple linear equations as a VQA task makes it interesting as the system now requires three kinds of understanding a) visual understanding to recognize digits, variables, operators and equal sign b) conceptual understanding of the symbolic meanings of coefficients' constants, variables, operators and equality and c) high level understanding of the interaction between the image and the questions in order to accurately answer them. We also create an open-source dataset for the same and compare the performance of our model with different baselines.
机译:本文提出了视觉问题解答任务的新方向。给定具有简单线性代数方程组的图像以及基于方程中变量的自然语言问题,我们提出了一种端到端深度学习模型,该模型可为与变量和其他值有关的问题提供准确的答案相关问题。将简单线性方程式的问题建模为VQA任务使其变得很有趣,因为系统现在需要三种理解方式:a)视觉理解,以识别数字,变量,运算符和等号b)对系数常数的符号含义的概念性理解,变量,运算符和相等性;以及c)对图像和问题之间的相互作用有较高的了解,以便准确地回答它们。我们还为此创建了一个开源数据集,并将我们的模型的性能与不同的基准进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号