首页> 美国卫生研究院文献>PLoS Clinical Trials >Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing
【2h】

Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing

机译:合成掺入标准改进了针对DNA和RNA测序的特定运行系统误差分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration.
机译:虽然随机测序错误的重要性在较高的DNA或RNA测序深度处降低,但系统测序错误(SSE)在较高的测序深度处占主导地位,并且可能难以与生物学变异区分开。这些SSE可能导致碱基质量得分低估某些基因组位置的错误可能性,从而导致假阳性变异,特别是在混合物中,例如带有RNA编辑的样本,肿瘤,循环肿瘤细胞,细菌,线粒体异质性或合并的DNA。提议用于校正SSE的大多数算法都需要一个数据集,该数据集用于计算SSE与读段和序列上下文中各种特征的关联。该数据集通常来自“被重新校准”的数据集的一部分(Genome Analysis ToolKit或GATK),或者来自具有特殊特征的单独数据集(SysCall)。在这里,我们通过将合成的RNA插入标准品添加到人类RNA中来结合这些方法的优势,并使用GATK来重新校准基本质量得分,并映射到该插入标准品中。与使用映射到基因组的读数的常规GATK重新校准相比,尖峰插入可将Illumina基本质量得分的准确性平均提高5 Phred缩放质量得分单位,并在CpG位点提高多达13个单位。此外,由于用于重新校准的刺入数据与被测序的基因组无关,因此即使对于许多没有完整而准确的SNP数据库的物种,我们的方法也可以进行特定于运行的重新校准。我们还将GATK与掺入标准结合使用,以证明Illumina RNA测序对AC,CC,GC,GG和TC二核苷酸的质量得分高估,而SOLiD在某些循环中的二核苷酸SSE少,但SSE多。我们得出结论,将这些DNA和RNA插入标样与GATK结合使用可改善碱基质量评分的重新校准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号