...
首页> 外文期刊>Genome research >ECHO: a reference-free short-read error correction algorithm.
【24h】

ECHO: a reference-free short-read error correction algorithm.

机译:ECHO:一种无参考的短读纠错算法。

获取原文
获取原文并翻译 | 示例
           

摘要

Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth.
机译:开发准确,可扩展的算法以提高数据质量是与高通量测序技术的最新进展相关的重要计算挑战。在这项研究中,引入了一种新的纠错算法,称为ECHO,用于纠正短读中的碱基检出错误,而无需参考基因组。与大多数以前的方法不同,ECHO不需要用户指定通常先验未知的最佳值的参数。 ECHO会自动在假定的模型中设置参数,并估计每次测序运行所特有的错误特征,同时将运行时间保持在实际使用范围内。 ECHO基于概率模型,并且能够为每个校正后的碱基分配质量得分。此外,它明确地模拟了二倍体基因组中的杂合性,并提供了一种无参考方法来检测源自杂合位点的碱基。在真实数据和模拟数据上,ECHO都可以将先前的纠错方法的准确性提高几倍,达到一个数量级,具体取决于序列覆盖深度和读取位置。在阅读末尾,以前的方法明显变得无效,这种改进最为明显。使用全基因组酵母数据集,此处证明了ECHO能够应对覆盖率不均的情况。另外,还显示出使用ECHO作为预处理步骤来进行纠错,这极大地促进了从头组装,特别是在低至中等序列覆盖深度的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号