...
首页> 外文期刊>BMC Bioinformatics >Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model
【24h】

Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model

机译:泊松混合模型基因表达数据序列分析的统计分析及意义试验

获取原文

摘要

Background Serial analysis of gene expression (SAGE) is used to obtain quantitative snapshots of the transcriptome. These profiles are count-based and are assumed to follow a Binomial or Poisson distribution. However, tag counts observed across multiple libraries (for example, one or more groups of biological replicates) have additional variance that cannot be accommodated by this assumption alone. Several models have been proposed to account for this effect, all of which utilize a continuous prior distribution to explain the excess variance. Here, a Poisson mixture model, which assumes excess variability arises from sampling a mixture of distinct components, is proposed and the merits of this model are discussed and evaluated. Results The goodness of fit of the Poisson mixture model on 15 sets of biological SAGE replicates is compared to the previously proposed hierarchical gamma-Poisson (negative binomial) model, and a substantial improvement is seen. In further support of the mixture model, there is observed: 1) an increase in the number of mixture components needed to fit the expression of tags representing more than one transcript; and 2) a tendency for components to cluster libraries into the same groups. A confidence score is presented that can identify tags that are differentially expressed between groups of SAGE libraries. Several examples where this test outperforms those previously proposed are highlighted. Conclusion The Poisson mixture model performs well as a) a method to represent SAGE data from biological replicates, and b) a basis to assign significance when testing for differential expression between multiple groups of replicates. Code for the R statistical software package is included to assist investigators in applying this model to their own data.
机译:背景技术基因表达(SAGE)的序列分析用于获得转录组的定量快照。这些配置文件基于计数,并假设遵循二项式或泊松分布。然而,在多个库中观察到的标签计数(例如,一个或多个生物复制组)具有额外的方差,其不能单独容纳。已经提出了几种模型来解释这一效果,所有这些都利用了连续的先前分配来解释过量方差。这里,提出了一种假设过多的变异性的泊松混合物模型,提出了采样不同组分的混合物,并讨论并评估该模型的优点。结果将15套生物鼠尾草复制的泊松混合物模型适合的良好与先前提出的等级伽马泊松(负二型)模型进行了比较,并且看到了大量改善。在进一步支持混合物模型中,观察到:1)符合表示多于一个转录物的标签表达所需的混合物组分的数量增加; 2)将组件倾向于对同一组的组件。提出了一种置信度分数,其可以识别在Sage库组之间差异表达的标签。突出显示此前提出的那些测试优于先前提出的几个例子。结论泊松混合模型表现良好的方法是一种代表生物复制的SAGE数据的方法,b)在测试多组复制之间的差异表达时分配重要性的基础。包括R统计软件包的代码,以帮助调查人员将此模型应用于自己的数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号