首页> 外文会议>European conference on machine learning and principles and practice of knowledge discovery in databases >The Search for Equations — Learning to Identify Similarities Between Mathematical Expressions
【24h】

The Search for Equations — Learning to Identify Similarities Between Mathematical Expressions

机译:寻找方程式—学习识别数学表达式之间的相似性

获取原文

摘要

On your search for scientific articles relevant to your research question, you judge the relevance of a mathematical expression that you stumble upon using extensive background knowledge about the domain, its problems and its notations. We wonder if machine learning can support this process and work toward implementing a search engine for mathematical expressions in scientific publications. Thousands of scientific publication with millions of mathematical expressions or equations are accessible at arXiv.org. We want to use this data to learn about equations, their distribution and their relations in order to find similar equations. To this end we propose an embedding model based on convolutional neural networks that maps bitmap images of equations into a low-dimensional vector-space where similarity is evaluated via dot-product. However, no annotated similarity data is available to train this mapping. We mitigate this by proposing a number of different unsupervised proxy tasks that use available features as weak labels. We evaluate our system using a number of metrics, including results on a small hand-labeled subset of equations. In addition, we show and discuss a number of result-sets for some sample queries. The results show that we are able to automatically identify related mathematical expressions. Our dataset is published at https://whadup.github.io/EquationLearning/ and we invite the community to use it.
机译:在搜索与您的研究问题相关的科学文章时,您会使用关于领域,其问题及其表示法的广泛背景知识来判断偶然发现的数学表达式的相关性。我们想知道机器学习是否可以支持此过程,并努力实现在科学出版物中实现数学表达式的搜索引擎。可以在arXiv.org上访问数以千计的科学出版物,其中包含数百万个数学表达式或方程式。我们想使用这些数据来了解方程,方程的分布及其关系,以便找到相似的方程。为此,我们提出了一种基于卷积神经网络的嵌入模型,该模型将方程的位图图像映射到低维向量空间,其中通过点积评估相似性。但是,没有带注释的相似性数据可用于训练此映射。我们通过建议使用可用功能作为弱标签的许多不同的无监督代理任务来减轻这种情况。我们使用许多指标来评估我们的系统,其中包括手工标记的一小部分方程组的结果。此外,我们显示并讨论了一些示例查询的许多结果集。结果表明,我们能够自动识别相关的数学表达式。我们的数据集发布在https://whadup.github.io/EquationLearning/上,我们邀请社区使用它。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号