首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering
【24h】

Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering

机译:用于高维数据聚类的增量式半监督聚类集成

获取原文
获取原文并翻译 | 示例

摘要

Traditional cluster ensemble approaches have three limitations: () They do not make use of prior knowledge of the datasets given by experts. () Most of the conventional cluster ensemble methods cannot obtain satisfactory results when handling high dimensional data. () All the ensemble members are considered, even the ones without positive contributions. In order to address the limitations of conventional cluster ensemble approaches, we first propose an incremental semi-supervised clustering ensemble framework (ISSCE) which makes use of the advantage of the random subspace technique, the constraint propagation approach, the proposed incremental ensemble member selection process, and the normalized cut algorithm to perform high dimensional data clustering. The random subspace technique is effective for handling high dimensional data, while the constraint propagation approach is useful for incorporating prior knowledge. The incremental ensemble member selection process is newly designed to judiciously remove redundant ensemble members based on a newly proposed local cost function and a global cost function, and the normalized cut algorithm is adopted to serve as the consensus function for providing more stable, robust, and accurate results. Then, a measure is proposed to quantify the similarity between two sets of attributes, and is used for computing the local cost function in ISSCE. Next, we analyze the time complexity of ISSCE theoretically. Finally, a set of nonparametric tests are adopted to compare m- ltiple semi-supervised clustering ensemble approaches over different datasets. The experiments on 18 real-world datasets, which include six UCI datasets and 12 cancer gene expression profiles, confirm that ISSCE works well on datasets with very high dimensionality, and outperforms the state-of-the-art semi-supervised clustering ensemble approaches.
机译:传统的集群集成方法具有三个局限性:()它们没有利用专家给出的数据集的先验知识。 ()大多数常规聚类集成方法在处理高维数据时无法获得令人满意的结果。 ()考虑了所有合奏成员,即使没有积极贡献的合奏成员也是如此。为了解决常规聚类集成方法的局限性,我们首先提出一种增量半监督聚类集成框架(ISSCE),该框架利用随机子空间技术,约束传播方法,拟议的增量集成成员选择过程的优势,以及执行高维数据聚类的归一化剪切算法。随机子空间技术对于处理高维数据非常有效,而约束传播方法对于合并先验知识很有用。全新设计了增量合奏成员选择过程,基于新提出的局部成本函数和全局成本函数,明智地删除了冗余的合奏成员,采用归一化剪切算法作为共识函数,以提供更稳定,鲁棒和准确的结果。然后,提出了一种量化两组属性之间相似度的措施,并将其用于计算ISSCE中的局部成本函数。接下来,我们从理论上分析ISSCE的时间复杂度。最后,采用了一组非参数检验来比较不同数据集上的多个半监督聚类集成方法。在18个真实数据集上进行的实验(包括6个UCI数据集和12个癌基因表达谱)证实了ISSCE在维数非常高的数据集上效果很好,并且优于最新的半监督聚类集成方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号