首页> 外文期刊>Engineering Applications of Artificial Intelligence >A novel data clustering algorithm using heuristic rules based on κ-nearest neighbors chain
【24h】

A novel data clustering algorithm using heuristic rules based on κ-nearest neighbors chain

机译:基于启发式规则的基于最近邻链的数据聚类算法

获取原文
获取原文并翻译 | 示例
           

摘要

In practice, clustering algorithms usually suffer from the complex structure of the dataset, including data distribution and dimensionality. Meanwhile, the number of clusters, which is required as an input, is usually unavailable. In this paper, we propose a novel data clustering algorithm: it uses heuristic rules based onk-nearest neighbors chain and does not require the number of clusters as the input parameter. Inspired by the PageRank algorithm, we first use random walk model to measure the importance of data points. Then, on the basis of the important data points, we build a K-Nearest Neighbors Chain (KNNC) to order theknearest neighbors by distance and propose two heuristic rules to find the proper number of clusters and initial clusters. The first heuristic rule is the gap of KNNC which reflects the degree of separation of clusters with convex shapes and the second one is the nearest neighbor gap of KNNC which reflects the inner compactness of a cluster. Comprehensive comparison results on synthetic and real datasets indicate that the proposed clustering algorithm can find the proper number of clusters and achieve comparable or even better performance than the popular clustering algorithms.
机译:实际上,聚类算法通常会遭受数据集复杂的结构的困扰,包括数据分布和维度。同时,作为输入所需的簇数通常不可用。在本文中,我们提出了一种新颖的数据聚类算法:它使用基于近邻邻居链的启发式规则,并且不需要聚类的数量作为输入参数。受PageRank算法的启发,我们首先使用随机游走模型来衡量数据点的重要性。然后,基于重要的数据点,我们建立了K最近邻链(KNNC)来按距离对最接近的邻居进行排序,并提出了两个启发式规则,以找到适当数量的聚类和初始聚类。第一个启发式规则是KNNC的间隙,它反映了凸形簇的分离程度,第二个启发式规则是KNNC的最近邻居间隙,它反映了簇的内部紧致性。综合数据集和真实数据集的比较结果表明,所提出的聚类算法可以找到适当数量的聚类,并且与流行的聚类算法相比具有可比甚至更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号