首页> 外文期刊>Journal of Information Recording >Retrieval of Mathematical Information with Syntactic and Semantic Structure over Web
【24h】

Retrieval of Mathematical Information with Syntactic and Semantic Structure over Web

机译:Web上的句法和语义结构检索数学信息

获取原文
获取原文并翻译 | 示例
           

摘要

Efficient retrieval of mathematical expressions over web is a complex process as compared to simple text search. This is only possible when the syntactic (e.g. Textual) and semantic (e.g. Structural) information of a mathematical expression is retrieved properly and analyzed methodically. In this paper, we are proposing a technique that indexes expressions along with their syntactic and semantic information. These expressions are represented in ContentMathML(CMML). To improve the memory efficiency in index, an encoding technique is introduced which encode CMML mathematical expressions in Braille Unicode characters. In order to improve ranking of retrieved documents, a weighting function is introduced which assign a weight to each indexing term. The weighting score of each term contributes in ranking function that improves the rank of a document which contains query terms. The proposed technique is evaluated on NTCIR-12 Wikipedia and Arxiv corpora. Performance is also measured using NTCIR-MathIR evaluation criteria. The precision for Wikipedia-formula-queries is achieved 47% and for Arxiv is achieved 44% at top 5 documents.
机译:与简单文本搜索相比,在Web上的数学表达式的有效检索是一个复杂的过程。只有当正确并有条理地分析数学表达式的语法(例如文本)和语义(例如结构)信息时才可以获得。在本文中,我们提出了一种索引表达式以及其句法和语义信息的技术。这些表达式在ContentMathml(CMML)中表示。为了提高索引中的记忆效率,介绍了编码技术,其在盲文Unicode字符中编码CMML数学表达式。为了改善检索的文档的排名,引入了对每个索引项的权重的加权函数。每个术语的加权分数有助于排名函数,从而提高包含查询术语的文档的等级。所提出的技术在NTCIR-12维基百科和Arxiv Corpora上进行了评估。使用NTCIR-Mathir评估标准也测量性能。维基百科配方查询的精度实现了47%,对于Arxiv在前5份文件中实现了44%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号