...
首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance
【24h】

All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance

机译:所有稀疏的PCA模型都是错误的,但有些很有用。 第一部分:计算分数,残差和解释方差

获取原文
获取原文并翻译 | 示例
           

摘要

Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA) that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. When moving from PCA to sPCA, there are a number of implications that the practitioner needs to be aware of. A relevant one is that scores and loadings in sPCA may not be orthogonal. For this reason, the traditional way of computing scores, residuals and variance explained that is used in the classical PCA can lead to unexpected properties and therefore incorrect interpretations in sPCA. This also affects how sPCA components should be visualized. In this paper we illustrate this problem both theoretically and numerically using simulations for several state-of-the-art sPCA algorithms, and provide proper computation of the different elements mentioned. We show that sPCA approaches present disparate and limited performance when modeling noise-free, sparse data. In a follow-up paper, we discuss the theoretical properties that lead to this undesired behavior. We title this series of papers after the famous phrase of George Box "All models are wrong, but some are useful" with the same original meaning: sPCA models are only approximations of reality and have structural limitations that should be taken into account by the practitioner, but properly applied they can be useful tools to understand data.
机译:稀疏主成分分析(SPCA)是一种基于主成分分析(PCA)的流行矩阵分子方法,该方法将方差最大化和稀疏性与改善数据解释的最终目标相结合。从PCA转移到SPCA时,有许多影响是从业者需要了解。相关的是SPCA中的分数和负载可能不是正交的。出于这个原因,在经典PCA中使用的传统计算评分,残差和方差方式可以导致意外的属性,因此在SPCA中的解释不正确。这也会影响SPCA组件应该如何可视化。在本文中,我们在理论上和数值上使用模拟来说明该问题,用于多个最先进的SPCA算法,并提供所提到的不同元素的适当计算。我们表明,在对无噪声,稀疏数据建模时,SPCA方法存在不同和有限的性能。在随访纸上,我们讨论了导致这种不期望的行为的理论属性。我们将这一系列论文称为乔治盒的着名短语“所有型号都是错误的,但有些是有用的”,具有相同的原始含义:SPCA模型只是现实的近似,并具有从业者考虑的结构限制,但适当应用,他们可以是了解数据的有用工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号