...
首页> 外文期刊>The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics >FINITE SAMPLE APPROXIMATION RESULTS FOR PRINCIPAL COMPONENT ANALYSIS: A MATRIX PERTURBATION APPROACH
【24h】

FINITE SAMPLE APPROXIMATION RESULTS FOR PRINCIPAL COMPONENT ANALYSIS: A MATRIX PERTURBATION APPROACH

机译:主成分分析的有限样本逼近结果:矩阵摄动法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n -> infinity. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the leading eigenvalue and eigenvector of sample PCA and population PCA under a spiked covariance model. In addition, we also consider the relation between finite sample PCA and the asymptotic results in the joint limit p, n -> infinity, with p = c. We present a matrix perturbation view of the "phase transition phenomenon," and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit. Moreover, our analysis also applies for finite p, n where we show that although there is no sharp phase transition as in the infinite case, either as a function of noise level or as a function of sample size n, the eigenvector of sample PICA may exhibit a sharp "loss of tracking," suddenly losing its relation to the (true) eigenvector of the population PCA matrix. This occurs due to a crossover between the eigenvalue due to the signal and the largest eigenvalue due to noise, whose eigenvector points in a random direction.
机译:主成分分析(PCA)是用于减少一组n个观测值(样本)的维的标准工具,每个观测值具有p个变量。在本文中,我们采用矩阵摄动方法,研究了在大小为n的有限样本上计算的PCA的特征值和特征向量与极限种群PCA的特征值和特征向量之间的非渐近关系,即n->无穷大。在机器学习中,我们提出了一个有限样本定理,该样本定理在加标协方差模型下极有可能保持样本PCA和总体PCA的前导特征值与特征向量之间的接近度。此外,我们还考虑了有限样本PCA与联合极限p,n->无穷大(p / n = c)中的渐近结果之间的关系。我们给出了“相变现象”的矩阵摄动图,以及在此渐近极限中基于特征值和特征向量重叠的简单线性代数推导。此外,我们的分析还适用于有限p,n,其中我们表明,尽管在无限情况下不存在尖锐的相变,但无论是作为噪声水平的函数还是作为样本大小n的函数,样本PICA的特征向量都可能表现出尖锐的“跟踪损失”,突然失去了与总体PCA矩阵的(真实)特征向量的关系。这是由于信号的特征值与噪声的最大特征值之间的交叉而产生的,其特征向量指向随机方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号