首页> 美国卫生研究院文献>Bioinformatics >Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes
【2h】

Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes

机译:有监督的主成分分析可对具有连续或生存结果的微阵列数据进行基因集富集

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Gene set analysis allows formal testing of subtle but coordinated changes in a group of genes, such as those defined by Gene Ontology (GO) or KEGG Pathway databases. We propose a new method for gene set analysis that is based on principal component analysis (PCA) of genes expression values in the gene set. PCA is an effective method for reducing high dimensionality and capture variations in gene expression values. However, one limitation with PCA is that the latent variable identified by the first PC may be unrelated to outcome.>Results: In the proposed supervised PCA (SPCA) model for gene set analysis, the PCs are estimated from a selected subset of genes that are associated with outcome. As outcome information is used in the gene selection step, this method is supervised, thus called the Supervised PCA model. Because of the gene selection step, test statistic in SPCA model can no longer be approximated well using t-distribution. We propose a two-component mixture distribution based on Gumbel exteme value distributions to account for the gene selection step. We show the proposed method compares favorably to currently available gene set analysis methods using simulated and real microarray data.>Software: The R code for the analysis used in this article are available upon request, we are currently working on implementing the proposed method in an R package.>Contact: .
机译:>动机:基因集分析允许对一组基因中细微但协调的变化进行正式测试,例如由基因本体论(GO)或KEGG Pathway数据库定义的基因。我们提出了一种新的基因组分析方法,该方法基于基因组中基因表达值的主成分分析(PCA)。 PCA是减少高维数并捕获基因表达值变化的有效方法。但是,PCA的局限性在于第一台PC识别出的潜在变量可能与结果无关。>结果:在拟议的监督PCA(SPCA)模型进行基因集分析的过程中,PC是从与结果相关的选定基因子集。由于在基因选择步骤中使用了结果信息,因此对该方法进行了监督,因此称为监督PCA模型。由于基因选择步骤的影响,使用t分布不再能够很好地近似SPCA模型中的测试统计量。我们提出了基于Gumbel极限值分布的两成分混合物分布,以说明基因选择步骤。我们证明了该方法与使用模拟和真实微阵列数据的当前基因组分析方法相比具有优势。>软件:本文中使用的R代码可根据要求提供,我们目前正在研究在R包中实施建议的方法。>联系方式:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号