首页> 外文期刊>Journal of multiple-valued logic and soft computing >Deciding on Number of Clusters by Multi-Objective Optimization and Validity Analysis
【24h】

Deciding on Number of Clusters by Multi-Objective Optimization and Validity Analysis

机译:通过多目标优化和有效性分析确定聚类数目

获取原文
获取原文并翻译 | 示例

摘要

Clustering is unsupervised process that classified a given set of objects into groups. The effectiveness of a clustering approach is mainly judged by its capability of producing clusters by maximizing both: within cluster similarity and between clusters dissimilarity. However, clustering algorithms expect the number of clusters be specified beforehand; this requires domain expertise. In this study, we demonstrate the effectiveness of different validity indices in guiding the process of a clustering approach that automatically determines the number of clusters before starting the actual clustering process. The target is achieved by first running a multi-objective genetic algorithm on a sample of the given dataset to find the set of alternative solutions for a given range of possible number of clusters. Then, we apply cluster validity indexes to find the most appropriate number of clusters. We decide on running the genetic algorithm on a sample rather than the whole dataset simply because we want to benefit from the power of the genetic algorithm in automatically estimating the number of clusters, without being negatively affected by the poor performance of the genetic algorithm process as the dataset size increases. Finally, we run CURE to do the actual clustering of the whole datset by feeding the determined number of clusters as input. The reported test results on two datasets demonstrate the applicability, efficiency and effectiveness of the proposed approach.
机译:聚类是无监督的过程,将给定的一组对象分为几组。聚类方法的有效性主要由其最大化聚类的能力来判断:聚类内相似度和聚类间相似度。但是,聚类算法期望预先指定聚类的数量;这需要领域专业知识。在这项研究中,我们证明了不同有效性指标在指导聚类方法过程中的有效性,该方法可以在开始实际聚类过程之前自动确定聚类数量。通过首先在给定数据集的样本上运行多目标遗传算法,以找到针对给定范围的可能数目的聚类的一组替代解,来实现目标。然后,我们应用集群有效性指标来找到最合适的集群数量。我们决定在样本而不是整个数据集上运行遗传算法,仅仅是因为我们想从遗传算法的功能中受益,从而可以自动估计聚类数,而不会受到遗传算法过程性能不佳的负面影响。数据集大小增加。最后,我们运行CURE通过输入确定数量的簇作为输入来对整个数据集进行实际的簇。在两个数据集上报告的测试结果证明了该方法的适用性,效率和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号