首页> 外文会议>International Workshop on Multi-Disciplinary Trends in Artificial Intelligence >A Structure Based Approach for Mathematical Expression Retrieval
【24h】

A Structure Based Approach for Mathematical Expression Retrieval

机译:基于结构的数学表达式检索方法

获取原文
获取外文期刊封面目录资料

摘要

Mathematical expression (ME) retrieval problem has currently received much attention due to wide-spread availability of MEs on the World Wide Web. As MEs are two-dimensional in nature, traditional text retrieval techniques used in natural language processing are not sufficient for their retrieval. In this paper, we have proposed a novel structure based approach to ME retrieval problem. In our approach, query given in $mbox{LaTeX}$ format is preprocessed to eliminate extraneous keywords (like displaystyle, egin{array} etc.) while retaining the structure information like superscript and subscript relationships. MEs in the database are also preprocessed and stored in the same manner. We have created a database of 829 MEs in $mbox{LaTeX}$ form, that covers various branches of mathematics like Algebra, Trigonometry, Calculus etc. Preprocessed query is matched against the database of preprocessed MEs using Longest Common Subsequence (LCS) algorithm. LCS algorithm is used as it preserves the order of keywords in the preprocessed MEs unlike bag of words approach in the traditional text retrieval techniques. We have incorporated structure information into LCS algorithm and proposed a measure based on the modified algorithm, for ranking MEs in the database. As proposed approach exploits structure information, it is closer to human intuition. Retrieval performance has been evaluated using standard precision measure.
机译:由于万维网上的MES广泛可用性,数学表达(ME)检索问题目前受到了很多关注。随着MES的二维本质上,自然语言处理中使用的传统文本检索技术不足以进行检索。在本文中,我们提出了一种基于新的结构方法来检索问题。在我们的方法中,在$ mbox { lavex} $格式中给出的查询是预处理的,以消除无关关键字(如 displaystyle, begin {array}等),同时保留Superscript和下标关系等结构信息。数据库中的MES也以相同的方式预处理和存储。我们在$ mbox { lavex} $表单中创建了一个829 ME的数据库,涵盖了代数,三角学,微积分等的各种数学分支。预处理查询与使用最长公共后续(LCS)的预处理MES数据库匹配算法。 LCS算法用于保留预处理ME中的关键字的顺序,而不是传统文本检索技术中的单词方法。我们已将结构信息纳入LCS算法,并提出了一种基于修改算法的度量,用于在数据库中排名MES。随着所提出的方法利用结构信息,它更接近人类直觉。使用标准精度测量评估了检索性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号