首页> 外文期刊>Procedia Computer Science >Ontological Optimization for Latent Semantic Indexing of Arabic Corpus
【24h】

Ontological Optimization for Latent Semantic Indexing of Arabic Corpus

机译:阿拉伯语料库潜在语义索引的本体优化

获取原文
           

摘要

The dimensionality reduction is a critical problem in the information retrieval process. The higher dimensions directly affect the search performance in terms of Recall and Precision. The dimensionality reduction enabling the search to be semantically based instead of lexically based as the dimensions are defined in terms of the semantic concepts instead of traditional terms or keywords. Latent Semantic Indexing (LSI) is a mathematical extension of the classical Vector Space Model (VSM). LSI is used to discover the latent semantic in the search space by extracting concepts from the original terms in the space. LSI is based on the Singular Value Decomposition (SVD) to reduce the dimension of the term space into a lower dimensional LSI space. In this paper, we propose a methodology for extra optimal LSI dimension reduction via two reduction levels. The first reduction level is based on an ontological conceptualization process. The Universal Wordnet ontology (UWN) is used to develop an ontological based concept space instead of the term space. As a second reduction level, the SVD is applied to the extracted concept space for getting an optimal LSI conceptualization. The experimental results of this research indicate an improvement in the search results in terms of both Precision and Recall as the proposed methodology addresses the Synonymy and Polysemy problems effectively.
机译:降维是信息检索过程中的关键问题。较高的尺寸会直接影响查全率和查全率。降维使得搜索可以基于语义而不是基于词法,因为这些维度是根据语义概念而不是传统术语或关键字来定义的。潜在语义索引(LSI)是经典向量空间模型(VSM)的数学扩展。 LSI通过从空间中的原始术语中提取概念来发现搜索空间中的潜在语义。 LSI基于奇异值分解(SVD),可将术语空间的维数缩减为较低维的LSI空间。在本文中,我们提出了一种通过两个缩减级别来实现额外最佳LSI尺寸缩减的方法。第一还原级别基于本体概念化过程。通用词网本体(UWN)用于开发基于本体的概念空间,而不是术语空间。作为第二简化级别,将SVD应用于提取的概念空间,以获取最佳的LSI概念化。这项研究的实验结果表明,由于拟议的方法有效地解决了同义词和多义性问题,因此在搜索精度和查全率方面都得到了改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号