首页> 外文期刊>The international arab journal of information technology >Enhanced Latent Semantic Indexing Using Cosine Similarity Measures for Medical Application
【24h】

Enhanced Latent Semantic Indexing Using Cosine Similarity Measures for Medical Application

机译:利用余弦相似措施增强潜在语义索引

获取原文
获取原文并翻译 | 示例
           

摘要

The Vector Space Model (VSM) is widely used in data mining and Information Retrieval (IR) systems as a common document representation model. However, there are some challenges to this technique such as high dimensional space and semantic looseness of the representation. Consequently, the Latent Semantic Indexing (LSI) was suggested to reduce the feature dimensions and to generate semantic rich features that can represent conceptual term-document associations. In fact, LSI has been effectively employed in search engines and many other Natural Language Processing (NLP) applications. Researchers thereby promote endless effort seeking for better performance. In this paper, we propose an innovative method that can be used in search engines to find better matched contents of the retrieving documents. The proposed method introduces a new extension for the LSI technique based on the cosine similarity measures. The performance evaluation was carried out using an Arabic language data collection that contains 800 medical related documents, with more than 47,222 unique words. The proposed method was assessed using a small testing set that contains five medical keywords. The results show that the performance of the proposed method is superior when compared to the standard LSI.
机译:矢量空间模型(VSM)广泛用于数据挖掘和信息检索(IR)系统作为公共文档表示模型。然而,这种技术存在一些挑战,例如高维空间和表示的语义松动。因此,建议潜在语义索引(LSI)来减少特征维度,并生成可以代表概念性术语关联的语义丰富功能。事实上,LSI已在搜索引擎和许多其他自然语言处理(NLP)应用中有效地使用。研究人员从而促进了寻求更好的表现的无尽努力。在本文中,我们提出了一种创新方法,可用于搜索引擎中,以找到更好的检索文档内容。该方法对基于余弦相似度措施引入了LSI技术的新扩展。性能评估使用阿拉伯语数据收集进行,其中包含800个医学相关文件,具有超过47,222个独特的单词。使用包含五种医用关键字的小型测试集进行评估所提出的方法。结果表明,与标准LSI相比,所提出的方法的性能优越。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号