首页> 外文会议>IEEE International Conference on Big Data >Finding Stable Clustering for Noisy Data via Structure-Aware Representation
【24h】

Finding Stable Clustering for Noisy Data via Structure-Aware Representation

机译:通过结构感知表示法找到嘈杂数据的稳定聚类

获取原文

摘要

Clustering is one of the most prominent topics in machine learning. A multitude of clustering methods have been proposed, among which the spectral clustering has attracted much attention. However, in practice, spectral clustering is highly sensitive to noise data and a post-processing step (e.g., k-means for eigenvectors) is often required to obtain clustering indicators, which may be not optimal. Also, it does not scale well to large-scale data due to its eigen-decomposition procedures.Here we propose a structure-aware clustering model to address those issues. To achieve our goal, a high-quality affinity matrix is extracted from the original noisy data by a sparse additive decomposition, which is used to approximate the ideal clustering structure. We then jointly learn the high-quality affinity matrix as well as the spectral embedding in a unified model— thus, being robust to noise and obtaining the optimal clustering indicators without any post-processing steps. We further improve the clustering stability by considering the Laplacian eigengap of the affinity matrix. We show that the larger the Laplacian eigengap, the more stable the clustering results. We introduce a speedup strategy to effectively compute eigenvectors of large matrices. Experimental results demonstrate that the proposed model outperforms existing approaches for noisy data.
机译:集群是机器学习中最突出的主题之一。提出了多种聚类方法,其中光谱聚类引起了人们的广泛关注。然而,实际上,频谱聚类对噪声数据高度敏感,并且常常需要后处理步骤(例如,特征向量的k-均值)来获得聚类指示符,这可能不是最佳的。此外,由于其特征分解程序,它不能很好地扩展到大规模数据。在此,我们提出了一种结构感知的聚类模型来解决这些问题。为了实现我们的目标,通过稀疏加法分解从原始噪声数据中提取了一个高质量的亲和矩阵,该矩阵被用来近似理想的聚类结构。然后,我们将共同学习高质量的亲和度矩阵以及在统一模型中进行频谱嵌入的方法,从而对噪声具有鲁棒性,并且无需任何后处理步骤即可获得最佳的聚类指标。我们通过考虑亲和矩阵的Laplacian eigengap进一步提高聚类稳定性。我们显示,拉普拉斯算子越大,聚类结果越稳定。我们介绍了一种加速策略,可以有效地计算大型矩阵的特征向量。实验结果表明,该模型优于现有的噪声数据处理方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号