...
首页> 外文期刊>Information Processing & Management >Building a morpho-semantic knowledge graph for Arabic information retrieval
【24h】

Building a morpho-semantic knowledge graph for Arabic information retrieval

机译:为阿拉伯语信息检索构建一个态度语义知识图

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose to build a morpho-semantic knowledge graph from Arabic vocalized corpora. Our work focuses on classical Arabic as it has not been deeply investigated in related works. We use a tool suite which allows analyzing and disambiguating Arabic texts, taking into account short diacritics to reduce ambiguities. At the morphological level, we combine Ghwanmeh stemmer and MADAMIRA which are adapted to extract a multi-level lexicon from Arabic vocalized corpora. At the semantic level, we infer semantic dependencies between tokens by exploiting contextual knowledge extracted by a concordancer. Both morphological and semantic links are represented through compressed graphs, which are accessed through lazy methods. These graphs are mined using a measure inspired from BM25 to compute one-to-many similarity. Indeed, we propose to evaluate the morpho-semantic Knowledge Graph in the context of Arabic Information Retrieval (IR). Several scenarios of document indexing and query expansion are assessed. That is, we vary indexing units for Arabic IR based on different levels of morphological knowledge, a challenging issue which is not yet resolved in previous research. We also experiment several combinations of morpho-semantic query expansion. This permits to validate our resource and to study its impact on IR based on state-of-the art evaluation metrics.
机译:在本文中,我们建议从阿拉伯语发声语料库中建立一个态度语义知识图。我们的工作侧重于古典阿拉伯语,因为它尚未深入调查相关工程。我们使用工具套件,允许分析和消除阿拉伯语文本,考虑到短的变速计来减少含糊不清。在形态学层面,我们将Ghwanmeh Sewer和Madamira结合起来,该Madamira适用于从阿拉伯语发声语料库中提取多级词典。在语义级别,我们通过利用协调官提取的上下文知识来推断令牌之间的语义依赖性。形态和语义链接都是通过压缩图所示的,通过惰性方法访问。使用从BM25启发的度量进行开采这些图形来计算一对多相似性。实际上,我们建议在阿拉伯语信息检索(IR)的背景下评估语音语义知识图。评估文档索引和查询扩展的几种情况。也就是说,我们基于不同水平的形态知识,这是在以前的研究中尚未解决的具有挑战性的问题而变化了阿拉伯语的索引单位。我们还试验了Morpho-Semantic查询扩展的几种组合。这允许验证我们的资源,并根据最先进的评估指标研究其对IR的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号