首页> 外文会议>Computer Applications in Industry and Engineering >Parameter Reduction for Density-based Clustering on Large Data Sets
【24h】

Parameter Reduction for Density-based Clustering on Large Data Sets

机译:大数据集上基于密度的聚类的参数约简

获取原文

摘要

Clustering on large datasets has become one of the most intensively studied areas with increasing data volumes. One of the problems of clustering on large datasets is minimal domain knowledge to determine the input parameters. In the density based clustering, the main input is the minimum neighborhood radius. The problem becomes more difficult when the clusters are in different densities. In this paper, we explore an automatic approach to determine the minimum neighborhood radius based on the distribution of datasets. The algorithm, MINR, is developed to determine the minimum neighborhood radii for different density clusters based on many experiments and observations. MINR can be used together with any density based clustering method to make a nonparametric clustering algorithm. In this paper, we combine MINR with the enhanced DBCSCAN, e-DBCSCAN. Experiments show our approach, is more efficient and scalable than TURN~*.
机译:随着数据量的增加,大型数据集上的聚类已成为研究最深入的领域之一。在大型数据集上进行聚类的问题之一是确定输入参数的领域知识最少。在基于密度的聚类中,主要输入是最小邻域半径。当簇的密度不同时,该问题将变得更加困难。在本文中,我们探索了一种基于数据集分布确定最小邻域半径的​​自动方法。基于许多实验和观察,开发了MINR算法来确定不同密度簇的最小邻域半径。 MINR可以与任何基于密度的聚类方法一起使用,以构成非参数聚类算法。在本文中,我们将MINR与增强的DBCSCAN,e-DBCSCAN结合在一起。实验表明,我们的方法比TURN〜*更有效,更可扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号