...
首页> 外文期刊>Neural Networks, IEEE Transactions on >Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering
【24h】

Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering

机译:频谱嵌入式群集:用于样本内和样本外频谱群集的框架

获取原文
获取原文并翻译 | 示例
           

摘要

Spectral clustering (SC) methods have been successfully applied to many real-world applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the high-density region of a low-dimensional data manifold have the same cluster label. However, such an assumption might not always hold on high-dimensional data. When the data do not exhibit a clear low-dimensional manifold structure (e.g., high-dimensional and sparse data), the clustering performance of SC will be degraded and become even worse than $K$ -means clustering. In this paper, motivated by the observation that the true cluster assignment matrix for high-dimensional data can be always embedded in a linear space spanned by the data, we propose the spectral embedded clustering (SEC) framework, in which a linearity regularization is explicitly added into the objective function of SC methods. More importantly, the proposed SEC framework can naturally deal with out-of-sample data. We also present a new Laplacian matrix constructed from a local regression of each pattern and incorporate it into our SEC framework to capture both local and global discriminative information for clustering. Comprehensive experiments on eight real-world high-dimensional datasets demonstrate the effectiveness and advantages of our SEC framework over existing SC methods and $K$-means-based clustering methods. Our SEC framework significantly outperforms SC using the Nyström algorithm on unseen data.
机译:光谱聚类(SC)方法已成功应用于许多实际应用中。这些SC方法的成功很大程度上取决于流形假设,即,低维数据流形的高密度区域中的两个附近数据点具有相同的聚类标记。但是,这样的假设可能并不总是适用于高维数据。当数据没有表现出清晰的低维流形结构(例如高维和稀疏数据)时,SC的聚类性能将下降,甚至比$ K $-均值聚类更差。本文基于这样的观察:基于高维数据的真实簇分配矩阵可以始终嵌入在数据跨越的线性空间中的观点,我们提出了一种频谱嵌入聚类(SEC)框架,其中明确地进行了线性正则化添加到SC方法的目标函数中。更重要的是,建议的SEC框架自然可以处理样本外数据。我们还展示了一种新的拉普拉斯矩阵,该矩阵是根据每种模式的局部回归构建的,并将其合并到我们的SEC框架中,以捕获局部和全局区分性信息进行聚类。在八个真实世界的高维数据集上进行的综合实验证明,与现有的SC方法和基于$ K $ -means的聚类方法相比,我们的SEC框架的有效性和优势。对于不可见的数据,我们的SEC框架使用Nyström算法的性能明显优于SC。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号