首页> 外文期刊>PLoS Genetics >A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data
【24h】

A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data

机译:灵活,高效的二项式混合模型,用于识别亚硫酸氢盐测序数据中的差异DNA甲基化

获取原文
           

摘要

Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html.
机译:识别DNA甲基化水平变异的来源对于理解基因调控很重要。最近,亚硫酸氢盐测序已成为研究DNA甲基化水平的流行工具。然而,亚硫酸氢盐测序数据的建模由于站点和单个样品的覆盖范围的巨大变化以及控制计数数据中遗传协方差的计算难题而变得复杂。为了解决这些挑战,我们提出了一种二项式混合模型和一种有效的基于采样的算法(MACAU:通过数据增强的计数数据混合模型关联),用于近似参数估计和p值计算。该框架使我们能够同时考虑亚硫酸氢盐测序数据的过度分散,基于计数的性质以及个体之间的遗传相关性。使用模拟和两个真实数据集(拟南芥的全基因组亚硫酸氢盐测序(WGBS)数据和狒狒的代表性降低的亚硫酸氢盐测序(RRBS)数据),我们证明了我们的方法在存在种群结构的情况下提供了经过良好校准的测试统计数据。此外,它提高了检测差异甲基化位点的能力:在RRBS数据集中,MACAU检测到的与年龄相关的CpG位点比β-二项式模型多1.6倍(次之)。这些位点的变化与已知的年龄相关的DNA甲基化水平变化相一致,并且在同一人群中随着年龄而差异表达的基因附近富集。综上所述,我们的结果表明,MACAU是分析亚硫酸氢盐测序数据的有效工具,特别是对结构化群体分析的显着性。可以从www.xzlab.org/software.html免费获得MACAU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号