首页> 外文会议>2017 Intelligent Systems Conference >Improving Arabic document clustering using K-means algorithm and Particle Swarm Optimization
【24h】

Improving Arabic document clustering using K-means algorithm and Particle Swarm Optimization

机译:使用K-means算法和粒子群算法改进阿拉伯文文档聚类

获取原文
获取原文并翻译 | 示例

摘要

Document clustering plays a vital role in text mining fields such as information retrieval, sentiment analysis, and text organizing. Document clustering aims to automatically divide a collection of documents based on some aspects of similarity into groups that are meaningful, useful or both. This paper aims to improve the clustering task for the Arabic documents. Recent studies show that partitioning clustering algorithms are more suitable for clustering process. However, k-means is the most common algorithm that is being used for clustering process because of its simplicity and speed. It can only generate an arbitrary solution because the results depend on the initial centers for the desired clusters “the seeds”. In this paper, a new modified k-means algorithm called PSO K-means, supported by Particle Swarm Optimization (PSO) is applied to enhance the Arabic document clustering process. Then, an intensive comparative study between the proposed model and the standard k-means algorithm is applied. Also, the stemming algorithms those are being used in Arabic language processing were assessed. Through the experiments, an evaluation for the new algorithm is done with three different Arabic data sets. The results demonstrate that the proposed model can produce more accurate results compared to the standard k-means algorithm for Arabic language documents. On the other hand, Arabic light stemmer is more suitable for the stemming step.
机译:文档聚类在文本挖掘领域(例如信息检索,情感分析和文本组织)起着至关重要的作用。文档聚类旨在基于相似性的某些方面自动将文档集合划分为有意义,有用或两者兼有的组。本文旨在改进阿拉伯文档的聚类任务。最近的研究表明,分区聚类算法更适合于聚类过程。然而,由于k-means的简单性和速度,它是最常用于聚类过程的算法。它只能生成任意解,因为结果取决于所需簇“种子”的初始中心。本文提出了一种新的改进的k均值算法,称为PSO K均值,并由粒子群优化(PSO)支持,以增强阿拉伯文档的聚类过程。然后,对所提出的模型与标准k-means算法进行了深入的比较研究。此外,评估了阿拉伯语处理中使用的词干算法。通过实验,使用三个不同的阿拉伯数据集对新算法进行了评估。结果表明,与用于阿拉伯语言文档的标准k-means算法相比,该模型可以产生更准确的结果。另一方面,阿拉伯语灯杆更适合于杆状步骤。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号