...
首页> 外文期刊>BMC Bioinformatics >An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data
【24h】

An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data

机译:全基因组亚硫酸氢矿测序数据建模与分析的信息 - 理论方法

获取原文

摘要

DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
机译:DNA甲基化是细胞使用的稳定形式的表观遗传记忆,以控制基因表达。全基因组亚硫酸氢盐测序(WGBS)作为通过产生高分辨率基因组甲基化型材来研究DNA甲基化的金标准实验技术。统计建模和分析用于从这些简档计算和量化这些型材的信息,以识别表现出至关重要或异常的表观遗传行为的基因组的区域。然而,最目前可用的甲基化分析方法的性能因其无法直接考虑相邻的甲基化位点之间的统计依赖性而受到阻碍,从而忽略了WGB中可用的重要信息。我们基于统计物理学的1D ising模型提出了一种强大的信息 - 基因组建模和分析WGBS数据。这种方法通过利用封装WGBS甲基化的所有信息的联合概率模型来考虑甲基化的相关性,即使在具有低覆盖的单个WGBS样本上施加的单个WGBS样品也会产生准确的结果。我们的方法使用Shannon Entopopy,提供了个体WGBS样品在全基因组中的甲基化型随机性的严格定量。此外,它利用了Jensen-Shannon距离来评估测试与参考样品之间的甲基化分布的差异。使用模拟和实际人肺正常/癌症数据的差异性能评估表明了我们对DSS的方法的清晰优势,最近提出的WGBS数据分析方法。批判性地,这些结果表明,当数据中存在相关性时,边际方法在统计上无效。这种贡献通过统计物理学的1D ising模型和使用信息理论的概念来说明使用统计物理学和定量甲基化随机性的甲基化和甲基化概率分布建模致畸值分布的明显益处和必要性。通过采用该方法,可以通过有效考虑到WGBS数据中可用的大量统计信息,大量改善DNA甲基化分析,这主要被现有方法忽略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号