...
首页> 外文期刊>BMC Bioinformatics >Obtaining insights from high-dimensional data: sparse principal covariates regression
【24h】

Obtaining insights from high-dimensional data: sparse principal covariates regression

机译:从高维数据获取见解:稀疏的主协变量回归

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Data analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables. Here, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse. Sparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data. The source code implementing our proposed method is available from GitHub, together with all scripts used to extract, pre-process, analyze, and post-process the data: https://github.com/katrijnvandeun/SPCovR .
机译:数据分析方法通常分为两个不同的类别:有预测方法,有探索方法。然而,实际上,经常需要以两种方式从数据中学习。例如,当基于全基因组mRNA转录速率预测疫苗接种后几周的抗体效价时,还寻求有关疫苗接种对免疫系统影响的机理见解。主协变量回归(PCovR)是一种结合了这两个目的的方法。但是,它错过了有洞察力的数据表示,因为它们包含所有变量。在这里,我们提出主协变量回归的稀疏扩展,以使所得的解决方案基于变量的自动选择子集。在仿真研究中,我们的方法表现出优于竞争方法,例如稀疏主成分回归和稀疏偏最小二乘。此外,该方法的良好性能在可公开获得的数据上得到了证明,包括针对流感疫苗接种的受试者的抗体滴度和全基因组转录速率:稀疏PCovR筛选的基因在免疫相关术语方面含量很高,并且该方法可预测独立测试样品的滴度好。相比之下,对于稀疏的偏最小二乘选择的基因,未发现明显富集的术语,样本外预测更差。稀疏的主协变量回归是一种从高维数据中获得洞察力的有前途和竞争性的工具。可以从GitHub获得实现我们提出的方法的源代码,以及用于提取,预处理,分析和后处理数据的所有脚本:https://github.com/katrijnvandeun/SPCovR。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号