首页> 外文期刊>Knowledge Organization >Organizing Contextual Knowledge for Arabic Text Disambiguation and Terminology Extraction
【24h】

Organizing Contextual Knowledge for Arabic Text Disambiguation and Terminology Extraction

机译:组织上下文知识以消除阿拉伯语文本的歧义和术语提取

获取原文
获取原文并翻译 | 示例

摘要

Ontologies have an important role in knowledge organization and information retrieval. Domain ontologies are composed of concepts represented by domain relevant terms. Existing approaches of ontology construction make use of statistical and linguistic information to extract domain relevant terms. The quality and the quantity of this information influence the accuracy of terminology extraction approaches and other steps in knowledge extraction and information retrieval. This paper proposes an approach for handling domain relevant terms from Arabic non-diacriticised semi-structured corpora. In input, the structure of documents is exploited to organize knowledge in a contextual graph, which is exploited to extract relevant terms. This network contains simple and compound nouns handled by a morphosyntactic shallow parser. The noun phrases are evaluated in terms of termhood and unithood by means of possibilistic measures. We apply a qualitative approach, which weighs terms according to their positions in the structure of the document. In output, the extracted knowledge is organized as network modeling dependencies between terms, which can be exploited to infer semantic relations. We test our approach on three specific domain corpora. The goal of this evaluation is to check if our model for organizing and exploiting contextual knowledge will improve the accuracy of extraction of simple and compound nouns. We also investigate the role of compound nouns in improving information retrieval results.
机译:本体在知识组织和信息检索中具有重要作用。领域本体由领域相关术语表示的概念组成。现有的本体构建方法利用统计和语言信息来提取领域相关术语。此信息的质量和数量会影响术语提取方法的准确性以及知识提取和信息检索中的其他步骤。本文提出了一种从阿拉伯语非变音符号半结构化语料库中处理领域相关术语的方法。在输入中,利用文档的结构来组织上下文图中的知识,利用上下文图来提取相关术语。该网络包含由句法浅解析器处理的简单和复合名词。名词短语通过可能的方法根据术语和单位性进行评估。我们采用定性方法,根据术语在文档结构中的位置对术语进行加权。在输出中,提取的知识被组织为术语之间的网络建模依存关系,可以用来推断语义关系。我们在三种特定领域的语料库上测试了我们的方法。此评估的目的是检查我们用于组织和利用上下文知识的模型是否会提高提取简单名词和复合名词的准确性。我们还研究了复合名词在改善信息检索结果中的作用。

著录项

  • 来源
    《Knowledge Organization》 |2011年第6期|p.473-490|共18页
  • 作者单位

    Department of Computer Science, Faculty of Sciences of Tunis, University of Tunis,1060 Tunis, Tunisia;

    RIADI-GDL Research Laboratory, The National School of Computer Sciences (ENSI),2010 Manouba, Tunisia, Informatics Research Institute of Toulouse (IRIT), 02 Rue Camichel, 31071 Toulouse, France;

    Informatics Research Institute of Toulouse (IRIT), 02 Rue Camichel,31071 Toulouse, France;

    Department of Computer Science, Faculty of Sciences of Tunis, University of Tunis,1060 Tunis, Tunisia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号