...
首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >High performance genetic algorithm based text clustering using parts of speech and outlier elimination
【24h】

High performance genetic algorithm based text clustering using parts of speech and outlier elimination

机译:基于高性能遗传算法的文本聚类,使用词性和离群值消除

获取原文
获取原文并翻译 | 示例
           

摘要

Among the typical clustering methods, the K-means algorithm plays the most important role in clustering because of its simplicity and efficiency. However, it is sensitive to the initial points and easy to fall into local optimum. In order to avoid this kind of flaw, a patented text clustering algorithm Clustering by Genetic Algorithm Model (CGAM) is revealed in this paper. CGAM constructs the fitness function of genetic algorithm (GA) and convergence criterion for K-means algorithm because GA simulates the natural evolutionary process and deals with a larger search space. To tackle the rich semantics of Chinese texts, CGAM creates an innovative selection method of initial centers of GA and accommodates the contribution of characteristics of different parts of speech. Moreover, the impact of outliers is addressed and treated. Its performance is demonstrated by a series of experiments based on both Reuters-21578 and Chinese text corpus. Experimental results show that the CGAM achieves clustering results better than other GA based K-means algorithms and has been successfully applied to national program of business intelligence system in the context of huge set of contents in both Chinese and English.
机译:在典型的聚类方法中,K-means算法由于其简单性和效率而在聚类中起着最重要的作用。但是,它对初始点敏感,容易陷入局部最优。为了避免这种缺陷,本文提出了一种基于遗传算法模型(CGAM)的专利文本聚类算法。 CGAM构造了遗传算法(GA)的适应度函数和K-means算法的收敛准则,因为GA模拟了自然进化过程并处理了较大的搜索空间。为了处理中文文本的丰富语义,CGAM创建了一种创新的GA初始中心选择方法,并适应了语音不同部分的特征。此外,解决并处理了异常值的影响。通过基于Reuters-21578和中文文本语料库的一系列实验证明了其性能。实验结果表明,CGAM的聚类结果优于其他基于GA的K-means算法,并且在中文和英文内容丰富的情况下已成功应用于国家商业智能系统程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号