All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance

Camacho J.; Smilde A. K.; Saccenti E.; Westerhuis J. A.

首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance

【24h】

All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance

机译：所有稀疏的PCA模型都是错误的，但有些很有用。第一部分：计算分数，残差和解释方差

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA) that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. When moving from PCA to sPCA, there are a number of implications that the practitioner needs to be aware of. A relevant one is that scores and loadings in sPCA may not be orthogonal. For this reason, the traditional way of computing scores, residuals and variance explained that is used in the classical PCA can lead to unexpected properties and therefore incorrect interpretations in sPCA. This also affects how sPCA components should be visualized. In this paper we illustrate this problem both theoretically and numerically using simulations for several state-of-the-art sPCA algorithms, and provide proper computation of the different elements mentioned. We show that sPCA approaches present disparate and limited performance when modeling noise-free, sparse data. In a follow-up paper, we discuss the theoretical properties that lead to this undesired behavior. We title this series of papers after the famous phrase of George Box "All models are wrong, but some are useful" with the same original meaning: sPCA models are only approximations of reality and have structural limitations that should be taken into account by the practitioner, but properly applied they can be useful tools to understand data.

机译：稀疏主成分分析（SPCA）是一种基于主成分分析（PCA）的流行矩阵分子方法，该方法将方差最大化和稀疏性与改善数据解释的最终目标相结合。从PCA转移到SPCA时，有许多影响是从业者需要了解。相关的是SPCA中的分数和负载可能不是正交的。出于这个原因，在经典PCA中使用的传统计算评分，残差和方差方式可以导致意外的属性，因此在SPCA中的解释不正确。这也会影响SPCA组件应该如何可视化。在本文中，我们在理论上和数值上使用模拟来说明该问题，用于多个最先进的SPCA算法，并提供所提到的不同元素的适当计算。我们表明，在对无噪声，稀疏数据建模时，SPCA方法存在不同和有限的性能。在随访纸上，我们讨论了导致这种不期望的行为的理论属性。我们将这一系列论文称为乔治盒的着名短语“所有型号都是错误的，但有些是有用的”，具有相同的原始含义：SPCA模型只是现实的近似，并具有从业者考虑的结构限制，但适当应用，他们可以是了解数据的有用工具。

著录项

来源
《Chemometrics and Intelligent Laboratory Systems》 |2020年第2020期|共10页
作者
Camacho J.; Smilde A. K.; Saccenti E.; Westerhuis J. A.;
展开▼
作者单位

Univ Granada Dept Signal Theory Telemat &

Commun Sch Comp Sci &

Telecommun CITIC Granada Spain;

Univ Amsterdam Biosyst Data Anal Amsterdam Netherlands;

Wageningen Univ &

Res Lab Syst &

Synthet Biol Wageningen Netherlands;

Univ Amsterdam Biosyst Data Anal Amsterdam Netherlands;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计量学;
关键词
Sparse principal component analysis; Explained variance; Scores; Residuals; Exploratory data analysis;

机译：稀疏主成分分析;解释方差;分数;残差;探索性数据分析;

相似文献

外文文献
中文文献
专利

1. All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance [J] . Camacho J., Smilde A. K., Saccenti E., Chemometrics and Intelligent Laboratory Systems . 2020,第期

机译：所有稀疏的PCA模型都是错误的，但有些很有用。第一部分：计算分数，残差和解释方差
2. Computation of mean and variance of the radiotherapy dose for PCA-modeled random shape and position variations of the target. [J] . E Budiarto, M Keijzer, P R M Storchi, Physics in medicine and biology. . 2014,第2期

机译：对于PCA建模的目标随机形状和位置变化，计算放射治疗剂量的均值和方差。
3. Computation of mean and variance of the radiotherapy dose for PCA-modeled random shape and position variations of the target. [J] . E Budiarto, M Keijzer, P R M Storchi, Physics in medicine and biology. . 2014,第2期

机译：用于PCA建模的随机形状和目标的放射疗法剂量的平均值和方差的计算。
4. Where Did I Go Wrong?: Explaining Errors in Business Process Models [C] . Niels Lohmann, Dirk Fahland International conference on business process management . 2014

机译：我哪里做错了？：解释业务流程模型中的错误
5. Bifactor models, explained common variance (ECV), and the usefulness of scores from unidimensional item response theory analyses [D] . Quinn, Hally O'Connor. 2014

机译：双因素模型，解释的共同方差（ECV）和一维项目响应理论分析中分数的实用性
6. A New Explained-Variance Based Genetic Risk Score for Predictive Modeling of Disease Risk [O] . Ronglin Che, Alison A. Motsinger-Reif -1

机译：一种新的基于解释方差的遗传风险评分用于疾病风险的预测建模
7. All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance [O] . J. Camacho, A.K. Smilde, E. Saccenti, 2020

机译：所有稀疏的PCA模型都是错误的，但有些很有用。第一部分：计算分数，残差和解释方差

All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance

摘要

著录项

相似文献

相关主题

期刊订阅