首页> 外文期刊>Journal of the royal statistical society >A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data
【24h】

A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data

机译:利用贝叶斯模型选择方法从RNA测序数据中鉴定差异表达的转录本

获取原文
获取原文并翻译 | 示例
           

摘要

Recent advances in molecular biology allow the quantification of the transcriptome and scoring transcripts as differentially or equally expressed between two biological conditions. Although these two tasks are closely linked, the available inference methods treat them separately: a primary model is used to estimate expression and its output is post processed by using a differential expression model. In the paper, both issues are simultaneously addressed by proposing the joint estimation of expression levels and differential expression: the unknown relative abundance of each transcript can either be equal or not between two conditions. A hierarchical Bayesian model builds on the BitSeq framework and the posterior distribution of transcript expression and differential expression is inferred by using Markov chain Monte Carlo sampling. It is shown that the model proposed enjoys conjugacy for fixed dimension variables; thus the full conditional distributions are analytically derived. Two samplers are constructed, a reversible jump Markov chain Monte Carlo sampler and a collapsed Gibbs sampler, and the latter is found to perform better. A cluster representation of the aligned reads to the transcriptome is introduced, allowing parallel estimation of the marginal posterior distribution of subsets of transcripts under reasonable computing time. Under a fixed prior probability of differential expression the clusterwise sampler has the same marginal posterior distributions as the raw sampler, but a more general prior structure is also employed. The algorithm proposed is benchmarked against alternative methods by using synthetic data sets and applied to real RNA sequencing data. Source code is available on line from https://github.com/mqbssppe/cjBitSeq.
机译:分子生物学的最新进展允许在两种生物学条件之间差异表达或均等表达时对转录组和评分转录产物进行定量。尽管这两个任务紧密相关,但是可用的推理方法将它们分开对待:使用主模型估计表达式,并使用差分表达式模型对其输出进行后处理。在本文中,通过提出表达水平和差异表达的联合估计,同时解决了这两个问题:每个转录本的未知相对丰度在两个条件之间可以相等或不相等。分层的贝叶斯模型建立在BitSeq框架上,并使用马尔可夫链蒙特卡洛采样法推断转录表达和差异表达的后验分布。结果表明,所提出的模型对固定维变量具有共轭性。因此,完整的条件分布可以通过分析得出。构建了两个采样器,一个可逆的跳跃马尔可夫链蒙特卡洛采样器和一个折叠的吉布斯采样器,发现后者的性能更好。引入了对转录组的比对读取的簇表示,从而允许在合理的计算时间下对转录本子集的边缘后分布进行并行估计。在差分表达式具有固定的先验概率的情况下,聚类采样器具有与原始采样器相同的边际后验分布,但是还采用了更通用的先验结构。提出的算法通过使用合成数据集与其他方法进行比较,并应用于真实的RNA测序数据。可从https://github.com/mqbssppe/cjBitSeq在线获取源代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号