We consider the problem of selecting the 'best' subset of exactly k columns from an m x n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn2, m2n}) time and returns as output an m x k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the top-k right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m x k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the 'best' rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that
展开▼
机译:我们考虑从mxn矩阵A中选择恰好k列的“最佳”子集的问题。特别是,我们提出并分析了一种新颖的两阶段算法,该算法在O(min {mn2,m2n})时间内运行,并返回输出一个恰好由A的k列组成的mxk矩阵C。在第一阶段(随机阶段),该算法根据明智选择的概率分布随机选择O(k log k)列,该概率分布取决于顶部信息, A的k个右奇异子空间。在第二阶段(确定性阶段),该算法应用确定性列选择过程从第一阶段中选择的列集中选择并返回k个列。令C为包含那k列的m x k矩阵,令PC表示到这些列的跨度上的投影矩阵,并让Ak表示对矩阵A的“最佳”秩k近似,这是通过奇异值分解计算得出的。然后,我们证明
展开▼