首页> 外文期刊>BMC Genomics >Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets
【24h】

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

机译:多个大规模两样本表达数据集的一致整合基因集富集分析

获取原文
           

摘要

BackgroundGene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment.MethodsWe categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets.ResultsWe used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method.ConclusionsThis study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.
机译:背景基因集富集分析(GSEA)是在途径水平上分析坐标表达变化的重要方法。尽管已为GSEA提出了许多统计和计算方法,但尚未很好地解决多表达式数据集的一致集成GSEA问题。在出于相同或相似研究目的而收集的不同相关数据集之间,重要的是确定具有一致富集的途径或基因集。由于数据噪声,我们从实验中观察到的结果可能无法说明基本事实。尽管实际上没有观察到这些类别,但是可以在混合模型框架中考虑它们。然后,我们定义了一致基因集富集的数学概念,并基于三成分多元正态混合模型计算了其相关概率。可以计算出相关的错误发现率,并将其用于对不同的基因集进行排序。结果我们使用了三个已公开发表的肺癌微阵列基因表达数据集来说明我们提出的方法。进行了基于前两个数据集的分析,以将我们的结果与基于针对每个单独数据集分别进行的GSEA的先前发布的结果进行比较。这种比较说明了我们提出的一致整合基因集富集分析的优势。然后,通过相对较新和较大的途径集合,我们使用我们的方法对前两个数据集以及所有三个数据集进行了综合分析。两项结果均表明,可以以低的错误发现率鉴定许多基因集。还观察到两个结果之间的一致性。基于KEGG癌症途径收集物的进一步探索表明,我们提出的方法可以识别这些途径中的大多数。基因表达数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号