Model-based quality assessment and base-calling for second-generation sequencing data.

Bravo HC; Irizarry RA

首页> 外文期刊>Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology >Model-based quality assessment and base-calling for second-generation sequencing data.

【24h】

Model-based quality assessment and base-calling for second-generation sequencing data.

机译：基于模型的质量评估和第二代测序数据的碱基检出。

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance.

机译：第二代测序（sec-gen）技术可以并行测序数百万个DNA的短片段，从而使其能够以较低的价格和时间来组装复杂的基因组。实际上，最近成立的国际财团“ 1000基因组计划”计划对约1200人的基因组进行完全测序。在未来五年内，可以实现跨多个人群的大量样本在序列水平上进行比较分析的前景。这些数据在统计分析中提出了前所未有的挑战。例如，分析可对数百万个短核苷酸序列或长度为30至100个字符的A，C，G或T的读取字符串进行分析，这是对嘈杂的连续荧光强度测量（称为碱基对）进行复杂处理的结果打电话。碱基检出离散化过程的复杂性导致读取序列样本内和序列样本之间质量差异很大。处理质量的这种变化会导致偶发性但系统性的错误，我们发现这会误导离散序列读取数据的下游分析。例如，“ 1000个基因组计划”的主要目标是量化单个核苷酸水平上的跨样本变异。在这种分辨率下，测序中的小错误率被证明是重要的，尤其是对于罕见的变体。 Sec-gen测序是一种相对较新的技术，尚未完全了解潜在的偏差和模糊变异的来源。因此，对序列读数产生中固有的不确定性进行建模和量化至关重要。在本文中，我们提出了一个简单的模型来捕获Illumina / Solexa GA平台的基本调用过程中出现的不确定性。模型参数可以根据碱基调用的化学性质进行简单的解释，从而可以提供信息丰富且易于解释的指标，以捕获测序质量的变化。我们的模型可以在质量评估工具中轻松提供这些有用的估计，同时显着提高基本呼叫的性能。

著录项

来源
《Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology》 |2010年第3期|共10页
作者
Bravo HC; Irizarry RA;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类普通生物学;生物科学的研究方法与技术;
关键词

相似文献

外文文献
中文文献
专利

1. Model-based quality assessment and base-calling for second-generation sequencing data. [J] . Bravo HC, Irizarry RA Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2010,第3期

机译：基于模型的质量评估和第二代测序数据的碱基检出。
2. naiveBayesCall: An Efficient Model-Based Base-Calling Algorithm for High-Throughput Sequencing [J] . Wei-Chun Kao, Yun S. Song Journal of computational biology . 2011,第3期

机译：naiveBayesCall：一种用于高通量测序的基于模型的高效碱基调用算法
3. BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing [J] . Wei-Chun Kao, Kristian Stevens, Yun S. Song Genome Research . 2009,第10期

机译：BayesCall：高通量短读测序的基于模型的碱基检出算法
4. naiveBayesCall: An Efficient Model-Based Base-Calling Algorithm for High-Throughput Sequencing [C] . Wei-Chun Kao, Yun S. Song Research in computational molecular biology . 2010

机译：naiveBayesCall：高通量测序的基于模型的高效碱基调用算法
5. Development of SRADE tool and analysis of quality scores of the reads of Next-Generation Sequencing data. [D] . Kotha, Chaitanya Krishna. 2014

机译：开发SRADE工具并分析下一代测序数据读数的质量得分。
6. Model-Based Quality Assessment and Base-Calling for Second-Generation Sequencing Data [O] . Héctor Corrada Bravo, Rafael A. Irizarry -1

机译：基于模型的质量评估和基于第二代排序数据的基本呼叫
7. MODEL-BASED QUALITY ASSESSMENT AND BASE-CALLING FOR SECOND-GENERATION SEQUENCING DATA [O] . Irizarry Rafael A., Bravo Hector Corrada 2009

机译：第二代排序数据的基于模型的质量评估和基础计算

Model-based quality assessment and base-calling for second-generation sequencing data.

摘要

著录项

相似文献

相关主题

期刊订阅