首页> 外文期刊>Computer Science & Information Technology >An Efficient Approach to Improve Arabic Documents Clustering Based on a new Keyphrases Extraction Algorithm
【24h】

An Efficient Approach to Improve Arabic Documents Clustering Based on a new Keyphrases Extraction Algorithm

机译:一种基于新的关键词提取算法的阿拉伯文档聚类改进方法

获取原文
       

摘要

Document Clustering algorithms goal is to create clusters that are coherent internally, but clearly different from each other. The useful expressions in the documents is often accompanied by a large amount of noise that is caused by the use of unnecessarywords, so it is indispensable to eliminate it and keeping just the useful information. Keyphrases extraction systems in Arabic are new phenomena. A number of Text Mining applications can use it to improve her results. The Keyphrases are defined as phrases that capture the main topics discussed in document; they offer a brief and precise summary of document content. Therefore, it can be a good solution to get rid of the existent noise from documents. In this paper, we propose a new method to solve the problem cited above especially for Arabic language documents, which is one of the most complex languages, by using a new Keyphrases extraction algorithm based on the Suffix Tree data structure (KpST). To evaluate our approach, we conduct an experimental study on Arabic Documents Clustering using the most popular approach of Hierarchical algorithms: Agglomerative Hierarchical algorithm with seven linkage techniques and a variety of distance functions and similarity measures to perform Arabic Document Clustering task. The obtained results show that our approach for extracting Keyphrases improves the clustering results
机译:文档集群算法的目标是创建内部一致但彼此明显不同的集群。文档中的有用表达经常伴随着大量不必要的单词所引起的噪音,因此消除它并仅保留有用的信息是必不可少的。阿拉伯语中的关键词提取系统是新现象。许多Text Mining应用程序都可以使用它来改善搜索结果。关键短语定义为捕获文档中讨论的主要主题的短语;他们提供了文档内容的简短准确摘要。因此,摆脱文档中存在的噪音可能是一个很好的解决方案。在本文中,我们提出了一种新方法,通过使用基于后缀树数据结构(KpST)的新关键字短语提取算法来解决上述问题,尤其是阿拉伯语言文档(这是最复杂的语言之一)。为了评估我们的方法,我们使用最流行的分层算法方法进行了阿拉伯文档聚类的实验研究:具有七个链接技术以及各种距离函数和相似性度量的聚集层次算法来执行阿拉伯文档聚类任务。所得结果表明,我们提取关键词的方法改善了聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号