首页> 外文期刊>Journal of the American statistical association >Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution
【24h】

Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution

机译:基于模板的模型以基对分辨率对下一代测序数据进行全基因组分析

获取原文
获取原文并翻译 | 示例
       

摘要

We consider the problem of estimating the genome-wide distribution of nucleosome positions from paired end sequencing data. We develop a modeling approach based on nonparametric templates to control for the variability along the sequence of read counts associated with nucleosomal DNA due to enzymatic digestion and other sample preparation steps, and we develop a calibrated Bayesian method to detect local concentrations of nucleosome positions. We also introduce a set of estimands that provides rich, interpretable summaries of nucleosorne positioning. Inference is carried out via a distributed Hamiltonian Monte Carlo algorithm that can scale linearly with the length of the genome being analyzed. We provide MPI-based Python implementations of the proposed methods, stand-alone and on Amazon EC2, which can provide inferences on an entire Saccharomyces cerevisiae genome in less than 1 hr on EC2. We evaluate the accuracy and reproducibility of the inferences leveraging a factorially designed simulation study and experimental replicates. The template-based approach we develop here is also applicable to single-end sequencing data by using alternative sources of fragment length information, and to ordered and sequential data more generally. It provides a flexible and scalable alternative to mixture models, hidden Markov models, and Parzen-window methods. Supplementary materials for this article are available online.
机译:我们考虑从配对的末端测序数据估计核小体位置的全基因组分布的问题。我们开发了一种基于非参数模板的建模方法,以控制由于酶消化和其他样品制备步骤而导致的与核小体DNA相关的读取计数沿序列的变异性,并且我们开发了一种校准的贝叶斯方法来检测核小体位置的局部浓度。我们还介绍了一组估计值,这些估计值提供了丰富的,可解释的核仁定位摘要。通过分布式哈密顿蒙特卡洛算法进行推断,该算法可随被分析基因组的长度线性变化。我们可以在Amazon EC2上独立提供基于MPI的拟议方法的Python实现,可以在不到2小时的时间内在EC2上推断出整个酿酒酵母基因组。我们利用析因设计的仿真研究和实验重复来评估推论的准确性和可重复性。我们在这里开发的基于模板的方法还可以通过使用片段长度信息的替代源来应用于单端测序数据,并且更普遍地适用于有序和顺序数据。它为混合模型,隐马尔可夫模型和Parzen窗口方法提供了灵活且可扩展的替代方案。可在线获得本文的补充材料。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号