...
首页> 外文期刊>Statistics and computing >Minimum spectral connectivity projection pursuit: Divisive clustering using optimal projections for spectral clustering
【24h】

Minimum spectral connectivity projection pursuit: Divisive clustering using optimal projections for spectral clustering

机译:最小的光谱连通性投影追求:使用最佳投影进行光谱聚类的分裂聚类

获取原文
获取原文并翻译 | 示例
           

摘要

We study the problem of determining the optimal low-dimensional projection for maximising the separability of a binary partition of an unlabelled dataset, as measured by spectral graph theory. This is achieved by finding projections which minimise the second eigenvalue of the graph Laplacian of the projected data, which corresponds to a non-convex, non-smooth optimisation problem. We show that the optimal univariate projection based on spectral connectivity converges to the vector normal to the maximum margin hyperplane through the data, as the scaling parameter is reduced to zero. This establishes a connection between connectivity as measured by spectral graph theory and maximal Euclidean separation. The computational cost associated with each eigen problem is quadratic in the number of data. To mitigate this issue, we propose an approximation method using microclusters with provable approximation error bounds. Combining multiple binary partitions within a divisive hierarchical model allows us to construct clustering solutions admitting clusters with varying scales and lying within different subspaces. We evaluate the performance of the proposed method on a large collection of benchmark datasets and find that it compares favourably with existing methods for projection pursuit and dimension reduction for data clustering. Applying the proposed approach for a decreasing sequence of scaling parameters allows us to obtain large margin clustering solutions, which are found to be competitive with those from dedicated maximum margin clustering algorithms.
机译:我们研究确定最佳低维投影的问题,以最大化未标记数据集的二进制分区的可分离性,这是通过频谱图理论进行测量的。这是通过找到使投影数据的图拉普拉斯算子的第二特征值最小化的投影来实现的,这对应于非凸,非平滑的优化问题。我们显示,当缩放参数减小为零时,基于频谱连通性的最佳单变量投影会收敛到通过数据垂直于最大余量超平面的向量。这在通过光谱图理论测得的连通性与最大欧几里得分离之间建立了联系。与每个本征问题相关的计算成本在数据数量上是平方的。为了缓解此问题,我们提出了一种使用具有可证明的近似误差范围的微簇的近似方法。在分裂的分层模型中组合多个二进制分区,使我们能够构建聚类解决方案,以允许聚类规模不同且位于不同子空间中。我们评估了该方法在大量基准数据集上的性能,并发现它与现有的用于数据聚类的投影追踪和降维方法相比具有优势。将提出的方法用于递减的缩放参数序列,可以使我们获得较大的余量聚类解决方案,与专用的最大余量聚类算法相比,该解决方案具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号