【24h】

Sparse PCA via Bipartite Matchings

机译:通过二分匹配来稀疏PCA

获取原文

摘要

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance. Such components can be computed one by one, repeatedly solving the single-component problem and deflating the input data matrix, but this greedy procedure is suboptimal. We present a novel algorithm for sparse PCA that jointly optimizes multiple disjoint components. The extracted features capture variance that lies within a multiplicative factor arbitrarily close to 1 from the optimal. Our algorithm is combinatorial and computes the desired components by solving multiple instances of the bipartite maximum weight matching problem. Its complexity grows as a low order polynomial in the ambient dimension of the input data, but exponentially in its rank. However, it can be effectively applied on a low-dimensional sketch of the input data. We evaluate our algorithm on real datasets and empirically demonstrate that in many cases it outperforms existing, deflation-based approaches.
机译:我们考虑以下多分量稀疏PCA问题:给定一组数据点,我们试图提取少量不相干的稀疏分量,以共同捕获最大可能的方差。可以逐个地计算这些分量,反复解决单分量问题并缩小输入数据矩阵,但是这种贪婪过程是次优的。我们提出了一种针对稀疏PCA的新型算法,该算法共同优化了多个不相交的分量。提取的特征捕获方差,该方差位于与最佳值任意接近1的乘法因子内。我们的算法是组合的,并且通过解决二分法最大权重匹配问题的多个实例来计算所需的分量。它的复杂度在输入数据的环境维度上作为低阶多项式增长,但在其等级上呈指数增长。但是,它可以有效地应用于输入数据的低维草图。我们在真实数据集上评估了我们的算法,并通过经验证明了在许多情况下,它的性能均优于现有的基于通缩的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号