【24h】

k-PbC: an improved cluster center initialization for categorical data clustering

机译:K-PBC:分类数据聚类的改进的集群中心初始化

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The performance of a partitional clustering algorithm is influenced by the initial random choice of cluster centers. Different runs of the clustering algorithm on the same data set often yield different results. This paper addresses that challenge by proposing an algorithm namedk-PbC, which takes advantage of non-random initialization from the view of pattern mining to improve clustering quality. Specifically,k-PbC first performs a maximal frequent itemset mining approach to find a set of initial clusters. It then uses a kernel-based method to form cluster centers and an information-theoretic based dissimilarity measure to estimate the distance between cluster centers and data objects. An extensive experimental study was performed on various real categorical data sets to draw a comparison betweenk-PbC and state-of-the-art categorical clustering algorithms in terms of clustering quality. Comparative results have revealed that the proposed initialization method can enhance clustering results andk-PbC outperforms compared algorithms for both internal and external validation metrics.
机译:分区聚类算法的性能受到集群中心的初始随机选择的影响。在同一数据集上的不同运行群集算法通常会产生不同的结果。本文通过提出算法Namedk-PBC来解决这一挑战,这利用了模式挖掘视图来提高聚类质量的非随机初始化。具体而言,K-PBC首先执行最大频繁的项目集挖掘方法,以查找一组初始集群。然后,它使用基于内核的方法来形成群集中心和基于信息的信息,以估计群集中心和数据对象之间的距离。对各种真实的基本数据集进行了广泛的实验研究,以在聚类质量方面绘制PBC和最先进的分类聚类算法。比较结果表明,所提出的初始化方法可以增强聚类结果ANDK-PBC优于内部和外部验证度量的比较算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号