Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution

Blocker Alexander W.; Airoldi Edoardo M.

首页> 外文期刊>Journal of the American statistical association >Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution

【24h】

Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution

机译：基于模板的模型以基对分辨率对下一代测序数据进行全基因组分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of estimating the genome-wide distribution of nucleosome positions from paired end sequencing data. We develop a modeling approach based on nonparametric templates to control for the variability along the sequence of read counts associated with nucleosomal DNA due to enzymatic digestion and other sample preparation steps, and we develop a calibrated Bayesian method to detect local concentrations of nucleosome positions. We also introduce a set of estimands that provides rich, interpretable summaries of nucleosorne positioning. Inference is carried out via a distributed Hamiltonian Monte Carlo algorithm that can scale linearly with the length of the genome being analyzed. We provide MPI-based Python implementations of the proposed methods, stand-alone and on Amazon EC2, which can provide inferences on an entire Saccharomyces cerevisiae genome in less than 1 hr on EC2. We evaluate the accuracy and reproducibility of the inferences leveraging a factorially designed simulation study and experimental replicates. The template-based approach we develop here is also applicable to single-end sequencing data by using alternative sources of fragment length information, and to ordered and sequential data more generally. It provides a flexible and scalable alternative to mixture models, hidden Markov models, and Parzen-window methods. Supplementary materials for this article are available online.

机译：我们考虑从配对的末端测序数据估计核小体位置的全基因组分布的问题。我们开发了一种基于非参数模板的建模方法，以控制由于酶消化和其他样品制备步骤而导致的与核小体DNA相关的读取计数沿序列的变异性，并且我们开发了一种校准的贝叶斯方法来检测核小体位置的局部浓度。我们还介绍了一组估计值，这些估计值提供了丰富的，可解释的核仁定位摘要。通过分布式哈密顿蒙特卡洛算法进行推断，该算法可随被分析基因组的长度线性变化。我们可以在Amazon EC2上独立提供基于MPI的拟议方法的Python实现，可以在不到2小时的时间内在EC2上推断出整个酿酒酵母基因组。我们利用析因设计的仿真研究和实验重复来评估推论的准确性和可重复性。我们在这里开发的基于模板的方法还可以通过使用片段长度信息的替代源来应用于单端测序数据，并且更普遍地适用于有序和顺序数据。它为混合模型，隐马尔可夫模型和Parzen窗口方法提供了灵活且可扩展的替代方案。可在线获得本文的补充材料。

著录项

来源
《Journal of the American statistical association》 |2016年第515期|967-987|共21页
作者
Blocker Alexander W.; Airoldi Edoardo M.;
展开▼
作者单位

Harvard Univ, Dept Stat, 1 Oxford St, Cambridge, MA 02138 USA;

Harvard Univ, Dept Stat, 1 Oxford St, Cambridge, MA 02138 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Calibrated Bayesian detection; Deconvolution; Hamiltonian Monte Carlo; Massive data; Measurement error; Nucleosomes; Parallel computation; Yeast;

机译：校准贝叶斯检测;反卷积;哈密顿蒙特卡洛;海量数据;测量误差;核小体;并行计算;酵母;

相似文献

外文文献
中文文献
专利

1. FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution. [J] . Ge H, Liu K, Juan T, Bioinformatics . 2011,第14期

机译：FusionMap：以碱基对的分辨率从下一代测序数据中检测融合基因。
2. FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution [J] . Wolfgang Hoeck Bioinformatics . 2011,第14期

机译：FusionMap：以碱基对分辨率从下一代测序数据中检测融合基因
3. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data [J] . LI LUO, YUN ZHU, MOMIAO XIONG Journal of computational biology: A journal of computational molecular cell biology . 2012,第6期

机译：用于下一代测序数据的全基因组关联分析的新型基于基因组信息的统计数据
4. A Comprehensive Analysis Workflow for Genome-Wide Screening Data from ChIP-Sequencing Experiments [C] . Hatice Gulcin Ozer, Doruk Bozdag, Terry Camerlengo, Bioinformatics and computational biology . 2009

机译：来自ChIP测序实验的全基因组筛选数据的综合分析工作流程
5. Datamining of genome-wide nucleosome data generated by next-generation sequencing [D] . Zhang, Zhenhai 2011

机译：下一代测序产生的基因组核心组数据的数据
6. SNPAAMapper: An efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data [O] . Yongsheng Bai, James Cavalcoli 2013

机译：SNPAAMapper：高效的全基因组SNP变异分析管道用于下一代测序数据
7. SNPAAMapper: An efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data [O] . Yongsheng Bai, James Cavalcoli 2013

机译：SNPAAPPER：一个有效的基因组SNP变体分析管道，用于下一代测序数据

Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution

摘要

著录项

相似文献

相关主题

期刊订阅