首页> 外文期刊>Journal of Computational and Graphical Statistics >Multiple Sample Data Spectroscopic Clustering of Large Datasets Using Nystrà m Extension
【24h】

Multiple Sample Data Spectroscopic Clustering of Large Datasets Using Nystrà m Extension

机译:使用NystrÃm扩展的大型数据集的多样本数据光谱聚类

获取原文
获取原文并翻译 | 示例

摘要

In this article, we focus on computational aspects of spectral clustering algorithms that have recently shown promising results in machine learning, statistics, and computer vision. These algorithms cluster observations (of size n) into groups by investigating eigenvectors of an affinity matrix or its Laplacian matrix, both of which are size n×n. However, when the sample size is large, the computation involved in the matrix eigen-decomposition is expensive or even infeasible. To overcome the computation hurdle, subsampling techniques, such as the Nyström extension, have been used to approximate eigenvectors of large matrices. We study statistical properties of this approximation and their influence on the accuracy of various spectral clustering algorithms. We show that the perturbation of the spectrum due to subsampling could lead to a large discrepancy among clustering results. In order to provide accurate and stable results for large datasets, we propose a method to combine multiple subsamples using data spectroscopic clustering and the Nyström extension. In addition, we propose a sparse approximation of the eigenvectors to further speed up computation. Simulation and experiments on real datasets show that our approaches work quickly and provide reasonable results that are more stable across samples than the single sample approach. This article has supplementary material online.View full textDownload full textKey WordsKernel methods, Mixture models, Perturbation theory, Statistical computingRelated var addthis_config = { ui_cobrand: "Taylor & Francis Online", services_compact: "citeulike,netvibes,twitter,technorati,delicious,linkedin,facebook,stumbleupon,digg,google,more", pubid: "ra-4dff56cd6bb1830b" }; Add to shortlist Link Permalink http://dx.doi.org/10.1080/10618600.2012.672104
机译:在本文中,我们将重点介绍频谱聚类算法的计算方面,这些方面最近在机器学习,统计和计算机视觉方面显示出令人鼓舞的结果。这些算法通过研究亲和力矩阵或其Laplacian矩阵的特征向量将观察值(大小为n)聚类为组,两者均为大小nÃn。然而,当样本量很大时,矩阵特征分解所涉及的计算是昂贵的,甚至是不可行的。为克服计算障碍,已使用子采样技术(例如NystrÃm扩展)来近似大矩阵的特征向量。我们研究了这种近似的统计特性及其对各种谱聚类算法准确性的影响。我们表明,由于二次采样引起的频谱扰动可能导致聚类结果之间存在较大差异。为了为大型数据集提供准确和稳定的结果,我们提出了一种使用数据光谱聚类和Nyström扩展组合多个子样本的方法。此外,我们提出了特征向量的稀疏近似,以进一步加快计算速度。在真实数据集上的仿真和实验表明,我们的方法可以快速工作并提供合理的结果,这些结果在各个样本中比单样本方法更稳定。本文在线上有补充材料。查看全文下载全文关键字内核方法,混合物模型,扰动理论,统计计算相关var addthis_config = {ui_cobrand:“泰勒和弗朗西斯在线”,servicescompact:“ citeulike,netvibes,twitter,technorati,delicious,linkedin ,facebook,stumbleupon,digg,google,更多”,发布号:“ ra-4dff56cd6bb1830b”};添加到候选列表链接永久链接http://dx.doi.org/10.1080/10618600.2012.672104

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号