首页> 外文会议>IET International Conference on Smart and Sustainable City 2013 >A H-K clustering algorithm based on ensemble learning
【24h】

A H-K clustering algorithm based on ensemble learning

机译:基于集成学习的H-K聚类算法

获取原文
获取原文并翻译 | 示例

摘要

The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. However, it will lead to a dimensional disaster problem when apply to high dimensional dataset clustering due to its high computational complexity. Clustering ensemble exerts ensemble learning technique to get a better clustering result through learning merged data set of multiple clustering results. The objective of this paper is to improve the performance of traditional H-K clustering algorithm in high dimensional datasets. Using ensemble learning, a new clustering algorithm is proposed named EPCAHK (Ensemble Principle Component Analysis Hierarchical K-means Clustering algorithm). In the EPCAHK algorithm, the high dimensional dataset is mapped into a low dimensional space using PCA method firstly. Subsequently, the clustering results of the hierarchical stage for obtaining initial information (e.g., the cluster number or the initial clustering centers) are integrated by using the min-transitive closure method. Finally, the final clustering result is achieved by using K-means clustering algorithm based on the ensemble clustering results above. The experimental results indicate that comparing to the traditional H-K clustering algorithm, the EPCAHK obtains a better clustering result. The average accuracy of the clustering results can reach up to 90% or above, and the stability for the large high dimensional dataset is also improved.
机译:传统的H-K聚类算法可以解决K-means聚类算法初始中心的随机性和优先级。但是,由于其高计算复杂度,在应用于高维数据集聚类时将导致维数灾难问题。聚类集成通过学习多个聚类结果的合并数据集,运用集成学习技术获得更好的聚类结果。本文的目的是提高高维数据集中传统H-K聚类算法的性能。利用集成学习,提出了一种新的聚类算法EPCAHK(集合主成分分析层次K-均值聚类算法)。在EPCAHK算法中,首先使用PCA方法将高维数据集映射到低维空间。随后,通过使用最小传递闭包方法,对用于获得初始信息(例如,簇数或初始簇中心)的分级阶段的簇结果进行积分。最后,基于上述整体聚类结果,使用K-means聚类算法获得最终的聚类结果。实验结果表明,与传统的H-K聚类算法相比,EPCAHK获得了更好的聚类结果。聚类结果的平均准确性可以达到90%或更高,并且大型高维数据集的稳定性也得到了提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号