首页> 外文期刊>Expert systems with applications >Spectral embedded generalized mean based k-nearest neighbors clustering with S-distance
【24h】

Spectral embedded generalized mean based k-nearest neighbors clustering with S-distance

机译:基于谱的谱嵌入式基于阈值与S距离聚类

获取原文
获取原文并翻译 | 示例
           

摘要

The spectral clustering algorithm is extensively employed in different aspects, especially in the field of pattern recognition. However, the efficient construction of the neighborhood graph is the main reason for its promising results. Generally, the similarity matrix relies on the applied similarity measure between two data points, selection of k-nearest neighbors (KNN), and approach for the construction of a neighborhood graph. In this study, we integrate S-distance to spectral clustering, which is capable to find out the complex and non-linear cluster structures. Moreover, generalized mean distance-based KNN is proposed to decrease the sensitiveness towards the value of the k. Also, a symmetry-favored KNN method is applied to construct the neighborhood graph, which reduces the impact of outliers and noisy data points. However, spectral clustering faces scalability and speedup issues in the case of large size datasets. Thus, the proposed spectral clustering algorithm is also executed in distributed environments. Several experiments are performed to validate the proposed clustering algorithm on 20 real-world datasets and 3 large size datasets. Experimental results demonstrate that the proposed clustering algorithm outperforms some of the baseline methods in terms of accuracy and clustering error rates. Finally, we conduct Wilcoxon's Rank-Sum test and illustrate that the proposed spectral clustering algorithm is statistically significant.
机译:光谱聚类算法在不同方面广泛使用,尤其是在模式识别领域。然而,邻域图的有效结构是其有前途结果的主要原因。通常,相似性矩阵依赖于两个数据点之间的应用相似度测量,选择K到最近邻居(knn),以及用于构建邻域图的方法。在这项研究中,我们将S距离集成到光谱聚类,这能够找到复杂和非线性集群结构。此外,提出了广义平均距离的kNN以降低朝向K的值的敏感性。此外,应用对称性的KNN方法来构建邻域图,这减少了异常值和嘈杂数据点的影响。但是,在大尺寸数据集的情况下,光谱聚类面临着可伸缩性和加速问题。因此,所提出的频谱聚类算法也在分布式环境中执行。执行几个实验以验证在20个现实世界数据集和3个大尺寸数据集上的所提出的聚类算法。实验结果表明,所提出的聚类算法在准确性和聚类误差率方面优于一些基线方法。最后,我们进行Wilcoxon的秩和测试,并说明所提出的谱聚类算法是统计上显着的。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号