首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Combining multiple clusterings using evidence accumulation
【24h】

Combining multiple clusterings using evidence accumulation

机译:使用证据积累组合多个聚类

获取原文
获取原文并翻译 | 示例

摘要

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble - a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n /spl times/ n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the k-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.
机译:我们探索了证据积累(EAC)的概念,以结合多个聚类的结果。首先,生成一个集群集合-一组对象分区。给定一个数据集(d个维中的n个对象或模式),产生数据分区的不同方法是:1)应用不同的聚类算法,以及2)应用具有不同参数或初始化值的相同聚类算法。此外,不同数据表示(特征空间)和聚类算法的组合也可以提供大量明显不同的数据分区。考虑到聚类集合中的各个分区,我们提出了一个提取一致聚类的简单框架。根据EAC概念,每个分区都被视为数据组织的独立证据,基于表决机制将各个数据分区组合在一起,以在n个模式之间生成新的n / spl times / n个相似度矩阵。通过在此矩阵上应用分层凝聚聚类算法,可以获得n个模式的最终数据分区。基于数据分区之间的互信息概念,我们已经开发了一种理论框架,用于分析所提出的集群组合策略及其评估。使用自举技术评估结果的稳定性。提出了一种基于证据积累的聚类算法的详细讨论,该算法使用基于k均值聚类算法的拆分和合并策略。将该方法在多个合成和真实数据集上的实验结果与其他组合策略进行了比较,并与由众所周知的聚类算法产生的单个聚类结果进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号