首页> 中文期刊> 《计算机系统应用》 >基于KD树子样的聚类初始化算法

基于KD树子样的聚类初始化算法

         

摘要

在处理大数据集聚类初始化问题时,随机子样法是一种重要的数据约简操作.对随机取样的过程、特征及缺陷进行了分析,提出一种基于KD树子样的聚类初始化方法.该方法利用KD树将样本空间以递归方式细分成多个子空间,并分别在各子空间中随机取样形成KD树子样,有效避免了随机子样分布有偏的不足,使得子样中好的聚类初始点也能很好的表达整个数据集的聚类结构.仿真结果表明,该方法选择的聚类初始点更加接近期望的聚类中心,能获得更高的聚类精度.%In the field of initialization of clustering for large data set, random sampling is used as an important reduction operation. This paper focuses on the process and property of random sampling, and proposes a novel random sampling method which is based on KD-Tree samples. Sample spaces were further divided into several sub spaces using KD-Tree. KD-Tree samples were created for each sub-space. This overcomes the defect of skewness of the random samples. Thus the good initial centroids can well describe the clustering category of the whole data set. The experiment results show that the cluster initial centroids selected by the new method is more closed to the desired cluster centers,and the better clustering accuracy can be achieved.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号