首页> 外文期刊>International Journal of Collaborative Intelligence >CC-K-means: a candidate centres-based K-means algorithm for text data
【24h】

CC-K-means: a candidate centres-based K-means algorithm for text data

机译:CC-K-means:基于候选中心的文本数据K-means算法

获取原文
获取原文并翻译 | 示例
           

摘要

K-means algorithm, one of the clustering algorithms, is widely applied to solve clustering problems of various data thanks to its simplicity and efficiency. However, the randomness of selecting centre points of the traditional K-means algorithm results in some defects such as low-speed of convergence or instability of clustering results. To overcome the impact of high-dimension during text clustering, latent semantic index (LSI) model is firstly adopted to reduce the dimensions of feature vector, and then weighted adjusted cosine similarity is used to calculate the similarity between documents to obtain better clustering effects. The high-density candidate centre points are partly updated to get the final clustering centres on the basis of density in the process of finding clustering centres. Experiment results show that the proposed algorithm can accurately find representative and decentralised clustering centres, which express a better performance in clustering.
机译:K-means算法是聚类算法之一,由于其简单性和高效性而被广泛用于解决各种数据的聚类问题。然而,传统的K-means算法选择中心点的随机性导致了诸如收敛速度低或聚类结果不稳定等缺陷。为了克服高维文本聚类的影响,首先采用潜在语义索引(LSI)模型来减小特征向量的维数,然后利用加权调整余弦相似度来计算文档之间的相似度,以获得更好的聚类效果。在查找聚类中心的过程中,将根据密度部分更新高密度候选中心点,以获得最终的聚类中心。实验结果表明,该算法能够准确地找到代表性的和分散的聚类中心,表现出较好的聚类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号