首页> 外文会议>Annual Conference of the IEEE Industrial Electronics Society >An efficient grid-based clustering method by finding density peaks
【24h】

An efficient grid-based clustering method by finding density peaks

机译:通过找到密度峰值的基于基于网格的聚类方法

获取原文

摘要

Clustering or categorizing an unprocessed data set is essential and critical in many areas. Much success has been published, which first needs to calculate the mutual distances between data points. It suffers from considerable computational costs, preventing the state-of-the-art methods such as the clustering method by fast search and find of density peaks (FSFDP, published in Science, 2014) from applying into real life (e.g., with thousands of data points). In this paper, an efficient grid-based clustering (GBC) method by finding density peaks is described. It keeps the advantage of the friendly interactive interface in the FSFDP, at the mean time, decreases enormously the computation complexity. The time complexity of the FSFDP is o(np(np - 1)/2) while our method decreases it to o(np * size of (grid)), where np is the number of data points and the size of grid is always much smaller than np so that the time complexity of our approach is almost linearly proportional to np. The presented GBC method by finding density peaks was able to calculate the densities and categorize datasets within much less time, which makes the density-peak-based algorithm practical. By using the presented algorithm, it was possible to cluster high-dimensional data sets as well. The GBC method by finding density peaks was successfully verified in clustering several datasets, which are commonly used to test clustering algorithms in published articles. It turned out that the presented method is much faster and efficient in clustering datasets into different categories than the conventional density-based ones, which makes the proposed method more preferable.
机译:群集或分类未处理的数据集是在许多领域的必不可少的且重要的。已经发布了大量成功,首先需要计算数据点之间的相互距离。它遭受了相当大的计算成本,防止了最先进的方法,例如通过快速搜索和查找密度峰(FSFDP,2014年发布的FSFDP)施加现实生活(例如,以成千上万数据点)。在本文中,描述了通过找到密度峰值的基于基于网格的聚类(GBC)方法。它在平均时间保持FSFDP中友好交互界面的优势,这使得计算复杂性极大地降低。 FSFDP的时间复杂性是O(NP(NP - 1)/ 2),而我们的方法将其降低到O(NP *大小(网格)),其中NP是数据点的数量,并且始终是网格的大小小于NP,因此我们方法的时间复杂性几乎是线性成比例与NP。通过找到密度峰值的呈现的GBC方法能够计算密度并在更短的时间内进行分类数据集,这使得基于密度峰值的算法实用。通过使用呈现的算法,也可以纳入高维数据集。通过查找密度峰值的GBC方法在聚类多个数据集中成功验证,该数据集通常用于测试已发布的文章中的聚类算法。事实证明,呈现的方法比传统的基于密度基于不同类别的聚类数据集更快,效率高,使得提出的方法更优选。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号