首页> 外文期刊>Journal of classification >Effects of Resampling in Determining the Number of Clusters in a Data Set
【24h】

Effects of Resampling in Determining the Number of Clusters in a Data Set

机译:重新采样在数据集中确定簇数的影响

获取原文
获取原文并翻译 | 示例
           

摘要

Using cluster validation indices is a widely applied method in order to detect the number of groups in a data set and as such a crucial step in the model validation process in clustering. The study presented in this paper demonstrates how the accuracy of certain indices can be significantly improved when calculated numerous times on data sets resampled from the original data. There are obviously many ways to resample data-in this study, three very common options are used: bootstrapping, data splitting (without subset overlap of two subsamples), and random subsetting (with subset overlap of two subsamples). Index values calculated on the basis of resampled data sets are compared to the values obtained from the original data partition. The primary hypothesis of the study states that resampling does generally improve index accuracy. The hypothesis is based on the notion of cluster stability: if there are stable clusters in a data set, a clustering algorithm should produce consistent results for data sampled or resampled from the same source. The primary hypothesis was partly confirmed; for external validation measures, it does indeed apply. The secondary hypothesis states that the resampling strategy itself does not play a significant role. This was also shown to be accurate, yet slight deviations between the resampling schemes suggest that splitting appears to yield slightly better results.
机译:使用聚类验证指数是一种广泛应用的方法,用于检测数据集中的组数,因此是聚类中模型验证过程的关键步骤。本文的研究表明,在对从原始数据中重新采样的数据集进行多次计算时,某些指标的准确性可以显著提高。在这项研究中,显然有很多方法可以对数据进行重新采样,使用了三种非常常见的方法:自举、数据分割(两个子样本的子集不重叠)和随机子集设置(两个子样本的子集重叠)。将基于重采样数据集计算的索引值与从原始数据分区获得的值进行比较。这项研究的主要假设是,重新采样通常会提高指数的准确性。该假设基于聚类稳定性的概念:如果数据集中存在稳定的聚类,则聚类算法应该为从同一来源采样或重新采样的数据产生一致的结果。初步假设得到部分证实;对于外部验证措施,它确实适用。第二个假设是,重采样策略本身并没有起到显著的作用。这也被证明是准确的,但重采样方案之间的微小偏差表明,分裂似乎产生了更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号