首页> 外文会议>Risk analysis IX >A kernel density smoothing method for determining an optimal number of clusters in continuous data
【24h】

A kernel density smoothing method for determining an optimal number of clusters in continuous data

机译:确定连续数据中最佳簇数的核密度平滑方法

获取原文
获取原文并翻译 | 示例

摘要

While data clustering algorithms are becoming increasingly popular across scientific, industrial and social data mining applications, model complexity remains a major challenge. Most clustering algorithms do not incorporate a mechanism for finding an optimal scale parameter that corresponds to an appropriate number of clusters. We propose (BASINS~(-1)), a kernel-density smoothing-based approach to data clustering. Its main ideas derive from two unsupervised clustering approaches - kernel density estimation (KDE) and scale-spacing clustering (SSC). The novel method determines the optimal number of clusters by first finding dense regions in data before separating them based on data-dependent parameter estimates. The optimal number of clusters is determined from different levels of smoothing after the inherent number of arbitrary shape clusters has been detected without a priori information. We demonstrate the applicability of the proposed method under both nested and non-nested hierarchical clustering methodologies. Simulated and real data results are presented to validate the performance of the method, with repeated runs showing high accuracy and reliability.
机译:尽管数据聚类算法在科学,工业和社会数据挖掘应用程序中越来越受欢迎,但是模型复杂性仍然是一个主要挑战。大多数聚类算法不包含用于找到与适当数量的聚类相对应的最佳比例参数的机制。我们提出(BASINS〜(-1)),这是一种基于核密度平滑的数据聚类方法。它的主要思想来自两种无监督的聚类方法-内核密度估计(KDE)和尺度间距聚类(SSC)。该新方法通过首先在数据中找到密集区域,然后再根据与数据相关的参数估计值将它们分离,从而确定最佳的簇数。在没有先验信息的情况下检测到任意形状的簇的固有数量之后,从不同的平滑度级别确定簇的最佳数量。我们证明了该方法在嵌套和非嵌套层次聚类方法下的适用性。给出了模拟和真实数据结果,以验证该方法的性能,重复运行显示出较高的准确性和可靠性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号