首页> 外文会议>International Conference on Cloud Computing and Big Data >An Improved K-means Text Clustering Algorithm by Optimizing Initial Cluster Centers
【24h】

An Improved K-means Text Clustering Algorithm by Optimizing Initial Cluster Centers

机译:通过优化初始聚类中心改进的K-means文本聚类算法

获取原文

摘要

K-means clustering algorithm is an influential algorithm in data mining. The traditional K-means algorithm has sensitivity to the initial cluster centers, leading to the result of clustering depends on the initial centers excessively. In order to overcome this shortcoming, this paper proposes an improved K-means text clustering algorithm by optimizing initial cluster centers. The algorithm first calculates the density of each data object in the data set, and then judge which data object is an isolated point. After removing all of isolated points, a set of data objects with high density is obtained. Afterwards, chooses k high density data objects as the initial cluster centers, where the distance between the data objects is the largest. The experimental results show that the improved K-means algorithm can improve the stability and accuracy of text clustering.
机译:K-means聚类算法是数据挖掘中的一种有影响力的算法。传统的K均值算法对初始聚类中心具有敏感性,导致聚类的结果过于依赖初始聚类中心。为了克服这个缺点,本文提出了一种通过优化初始聚类中心的改进的K-means文本聚类算法。该算法首先计算数据集中每个数据对象的密度,然后判断哪个数据对象是一个孤立点。删除所有孤立点后,将获得一组具有高密度的数据对象。然后,选择k个高密度数据对象作为初始聚类中心,其中数据对象之间的距离最大。实验结果表明,改进的K-means算法可以提高文本聚类的稳定性和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号