首页> 外文期刊>BMC Genomics >A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes
【24h】

A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes

机译:利用二倍体基因组从RNA-Seq数据估计等位基因特异性表达的贝叶斯方法

获取原文
           

摘要

RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http:/agasakilab.csml.org/ase-tigar .
机译:RNA测序(RNA-Seq)已成为哺乳动物转录组分析的流行工具。然而,基于与参考基因组的读段比对准确估计等位基因特异性表达(ASE)是一项挑战,因为它在镶嵌单倍体基因组上仅包含一个等位基因。即使具有二倍体基因组序列的信息,由于相应等位基因序列之间的高度相似性,也难以将读数精确地对准正确的等位基因。我们提出了一种贝叶斯方法,根据具有二倍体基因组序列的RNA-Seq数据估算ASE。在统计框架中,将单倍体选择建模为一个隐藏变量,并通过变分贝叶斯推断与同工型表达水平同时估算。通过仿真数据分析,我们证明了与现有方法相比,该方法在识别ASE方面的有效性。我们还表明,与现有方法TIGAR2,RSEM和袖扣相比,我们的方法能够更好地量化同工型表达水平。在人类参考淋巴母细胞样细胞系GM12878的真实数据分析中,一些常染色体基因被鉴定为ASE基因,并且鉴定了GM12878中偏斜的父系X染色体失活。所提出的称为ASE-TIGAR的方法能够以等位基因特异性方式从RNA-Seq数据准确估算基因表达。我们的结果表明利用个人基因组信息来准确估算ASE的有效性。我们的方法的实现可在http:/agasakilab.csml.org/ase-tigar上获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号