首页> 外文会议>International Conference on Computer Simulation in Risk Analysis and Hazard Mitigation >A kernel density smoothing method for determining an optimal number of clusters in continuous data
【24h】

A kernel density smoothing method for determining an optimal number of clusters in continuous data

机译:用于确定连续数据中最佳簇的最佳簇的核密度平滑方法

获取原文

摘要

While data clustering algorithms are becoming increasingly popular across scientific, industrial and social data mining applications, model complexity remains a major challenge. Most clustering algorithms do not incorporate a mechanism for finding an optimal scale parameter that corresponds to an appropriate number of clusters. We propose (BASINS~(-1)), a kernel-density smoothing-based approach to data clustering. Its main ideas derive from two unsupervised clustering approaches - kernel density estimation (KDE) and scale-spacing clustering (SSC). The novel method determines the optimal number of clusters by first finding dense regions in data before separating them based on data-dependent parameter estimates. The optimal number of clusters is determined from different levels of smoothing after the inherent number of arbitrary shape clusters has been detected without a priori information. We demonstrate the applicability of the proposed method under both nested and non-nested hierarchical clustering methodologies. Simulated and real data results are presented to validate the performance of the method, with repeated runs showing high accuracy and reliability.
机译:虽然数据聚类算法越来越受科学,工业和社交挖掘应用程序越来越受欢迎,但模型复杂性仍然是一个重大挑战。大多数聚类算法不包含用于查找与适当数量的群集对应的最佳刻度参数的机制。我们提出(盆地〜(-1)),基于内核密度平滑的数据聚类方法。其主要思想从两种无人监督的聚类方法 - 内核密度估计(KDE)和比例间距聚类(SSC)。在基于数据相关参数估计的情况下,通过首先在数据中找到密集区域来确定群集的最佳数量。在没有先验信息的情况下检测到任意形状簇的固有数量之后,从不同的平滑级别确定了簇的最佳数量。我们展示了所提出的方法在嵌套和非嵌套分层聚类方法下的适用性。提出了模拟和实际数据结果以验证该方法的性能,重复运行显示出高精度和可靠性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号