首页> 外文会议>Annual Conference of the IEEE Industrial Electronics Society >An efficient grid-based clustering method by finding density peaks
【24h】

An efficient grid-based clustering method by finding density peaks

机译:通过找到密度峰值的有效的基于网格的聚类方法

获取原文

摘要

Clustering or categorizing an unprocessed data set is essential and critical in many areas. Much success has been published, which first needs to calculate the mutual distances between data points. It suffers from considerable computational costs, preventing the state-of-the-art methods such as the clustering method by fast search and find of density peaks (FSFDP, published in Science, 2014) from applying into real life (e.g., with thousands of data points). In this paper, an efficient grid-based clustering (GBC) method by finding density peaks is described. It keeps the advantage of the friendly interactive interface in the FSFDP, at the mean time, decreases enormously the computation complexity. The time complexity of the FSFDP is o(np(np - 1)/2) while our method decreases it to o(np * size of (grid)), where np is the number of data points and the size of grid is always much smaller than np so that the time complexity of our approach is almost linearly proportional to np. The presented GBC method by finding density peaks was able to calculate the densities and categorize datasets within much less time, which makes the density-peak-based algorithm practical. By using the presented algorithm, it was possible to cluster high-dimensional data sets as well. The GBC method by finding density peaks was successfully verified in clustering several datasets, which are commonly used to test clustering algorithms in published articles. It turned out that the presented method is much faster and efficient in clustering datasets into different categories than the conventional density-based ones, which makes the proposed method more preferable.
机译:在许多领域中,对未处理的数据集进行聚类或分类至关重要。已经取得了很多成功,这首先需要计算数据点之间的相互距离。它遭受了可观的计算成本,阻止了诸如通过快速搜索和发现密度峰的聚类方法(FSFDP,Science,2014年)之类的最新方法应用于现实生活中(例如,成千上万个数据点)。在本文中,描述了一种通过找到密度峰值的有效基于网格的聚类(GBC)方法。它保留了FSFDP中友好的交互式界面的优势,与此同时,大大降低了计算复杂度。 FSFDP的时间复杂度为o(np(np-1)/ 2),而我们的方法将其降低为o(np *(grid)的大小),其中np是数据点的数量,网格的大小始终是比np小得多,因此我们方法的时间复杂度几乎与np成线性比例关系。提出的通过发现密度峰值的GBC方法能够在更短的时间内计算出密度并对数据集进行分类,这使基于密度峰的算法变得实用。通过使用提出的算法,也可以对高维数据集进行聚类。通过查找密度峰值的GBC方法已成功地聚类了几个数据集,这些数据集通常用于测试已发表文章中的聚类算法。结果表明,与传统的基于密度的方法相比,该方法在将数据集聚类到不同类别中方面要更快,更有效,这使得该方法更为可取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号