...
首页> 外文期刊>BMC Bioinformatics >Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis
【24h】

Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis

机译:主成分分析中因素负荷的统计假设检验及其在代谢物富集分析中的应用

获取原文

摘要

Background Principal component analysis (PCA) has been widely used to visualize high-dimensional metabolomic data in a two- or three-dimensional subspace. In metabolomics, some metabolites (e.g., the top 10 metabolites) have been subjectively selected when using factor loading in PCA, and biological inferences are made for these metabolites. However, this approach may lead to biased biological inferences because these metabolites are not objectively selected with statistical criteria. Results We propose a statistical procedure that selects metabolites with statistical hypothesis testing of the factor loading in PCA and makes biological inferences about these significant metabolites with a metabolite set enrichment analysis (MSEA). This procedure depends on the fact that the eigenvector in PCA for autoscaled data is proportional to the correlation coefficient between the PC score and each metabolite level. We applied this approach to two sets of metabolomic data from mouse liver samples: 136 of 282 metabolites in the first case study and 66 of 275 metabolites in the second case study were statistically significant. This result suggests that to set the number of metabolites before the analysis is inappropriate because the number of significant metabolites differs in each study when factor loading is used in PCA. Moreover, when an MSEA of these significant metabolites was performed, significant metabolic pathways were detected, which were acceptable in terms of previous biological knowledge. Conclusions It is essential to select metabolites statistically to make unbiased biological inferences from metabolomic data when using factor loading in PCA. We propose a statistical procedure to select metabolites with statistical hypothesis testing of the factor loading in PCA, and to draw biological inferences about these significant metabolites with MSEA. We have developed an R package “mseapca” to facilitate this approach. The “mseapca” package is publicly available at the CRAN website.
机译:背景主成分分析(PCA)已被广泛用于可视化二维或三维子空间中的高维代谢组学数据。在代谢组学中,在PCA中使用因子加载时已主观选择了某些代谢物(例如前10个代谢物),并对这些代谢物进行了生物学推断。但是,这种方法可能会导致生物学推论有偏见,因为这些代谢物不是根据统计标准客观选择的。结果我们提出了一种统计程序,该程序通过对PCA中的因子负荷进行统计假设检验来选择代谢物,并通过代谢物集富集分析(MSEA)对这些重要代谢物进行生物学推断。此过程取决于以下事实:PCA中用于自动缩放数据的特征向量与PC得分与每个代谢物水平之间的相关系数成比例。我们将这种方法应用于小鼠肝脏样品的两组代谢组学数据:第一个案例研究中的282种代谢物中的136个和第二个案例研究中的275种代谢物中的66个具有统计学意义。该结果表明,在分析中设置代谢物的数量是不合适的,因为在每次研究中,当在PCA中使用因子加载时,重要代谢物的数量都不同。此外,当执行这些重要代谢物的MSEA时,检测到重要的代谢途径,就以前的生物学知识而言,这是可以接受的。结论在PCA中使用因子加载时,从统计学上选择代谢物以从代谢组学数据做出无偏倚的生物学推论是至关重要的。我们提出了一种统计程序,通过对PCA中的因子负荷进行统计假设检验来选择代谢物,并利用MSEA得出有关这些重要代谢物的生物学推断。我们开发了一个R包“ mseapca”来促进这种方法。 “ mseapca”软件包可在CRAN网站上公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号