首页> 外文学位 >Spectral curvature clustering for Hybrid Linear Modeling.
【24h】

Spectral curvature clustering for Hybrid Linear Modeling.

机译:用于混合线性建模的谱曲率聚类。

获取原文
获取原文并翻译 | 示例

摘要

The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Many algorithms have been proposed to solve this problem, however, probabilistic analysis of their performance is missing. In this thesis we develop the Spectral Curvature Clustering (SCC) algorithm as a combination of Govindu's multi-way spectral clustering framework (CVPR 2005) and Ng et al.'s spectral clustering algorithm (NIPS 2001) while introducing a new affinity measure. Our analysis shows that if the given data is sampled from a mixture of distributions concentrated around affine subspaces, then with high sampling probability the SCC algorithm segments well the different underlying clusters. The goodness of clustering depends on the within-cluster errors, the between-clusters interaction, and a tuning parameter applied by SCC. Supported by the theory, we then present several novel techniques for improving the performance of the algorithm. Specifically, we suggest an iterative sampling procedure to improve the existing uniform sampling strategy, an automatic scheme of inferring the tuning parameter from data, a precise initialization procedure for K-means, as well as a simple strategy for isolating outliers. The resulting algorithm requires only linear storage and takes linear running time in the size of the data. We compare it with other state-of-the-art methods on a few artificial instances of affine subspaces. Application of the algorithm to several real-world problems is also discussed.
机译:混合线性建模(HLM)的问题是使用仿射子空间的混合对数据进行建模和分段。已经提出了许多算法来解决该问题,但是,缺少对其性能的概率分析。在本文中,我们结合Govindu的多向谱聚类框架(CVPR 2005)和Ng等人的谱聚类算法(NIPS 2001)的组合,开发了光谱曲率聚类(SCC)算法,同时引入了新的亲和力度量。我们的分析表明,如果给定数据是从围绕仿射子空间集中的分布的混合中采样的,则SCC算法具有很高的采样概率,可以很好地分割不同的基础簇。群集的好坏取决于群集内的错误,群集之间的交互以及SCC应用的调整参数。在该理论的支持下,我们随后提出了几种改进算法性能的新颖技术。具体来说,我们建议采用迭代采样程序来改善现有的统一采样策略,从数据中推断调整参数的自动方案,针对K均值的精确初始化程序,以及用于隔离异常值的简单策略。生成的算法仅需要线性存储,并且在数据大小上需要线性运行时间。我们将其与仿射子空间的一些人工实例上的其他最新方法进行比较。还讨论了该算法在几个实际问题中的应用。

著录项

  • 作者

    Chen, Guangliang.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Mathematics.Computer Science.Statistics.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 102 p.
  • 总页数 102
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号