...
首页> 外文期刊>Bioinformatics >Reptile: representative tiling for short read error correction
【24h】

Reptile: representative tiling for short read error correction

机译:爬行动物:用于短读错误纠正的代表性拼贴

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Error correction is critical to the success of nextgeneration sequencing applications, such as resequencing and de novo genome sequencing. It is especially important for highthroughput short- read sequencing, where reads are much shorter and more abundant, and errors more frequent than in traditional Sanger sequencing. Processing massive numbers of short reads with existing error correction methods is both compute and memory intensive, yet the results are far from satisfactory when applied to real datasets.Results: We present a novel approach, termed Reptile, for error correction in short-read data from next-generation sequencing. Reptile works with the spectrum of k-mers from the input reads, and corrects errors by simultaneously examining: (i) Hamming distance-based correction possibilities for potentially erroneous k-mers; and (ii) neighboring k-mers from the same read for correct contextual information. By not needing to store input data, Reptile has the favorable property that it can handle data that does not fit in main memory. In addition to sequence data, Reptile can make use of available quality score information. Our experiments show that Reptile outperforms previous methods in the percentage of errors removed from the data and the accuracy in true base assignment. In addition, a significant reduction in run time and memory usage have been achieved compared with previous methods, making it more practical for short-read error correction when sampling larger genomes.
机译:动机:纠错对于下一代测序应用的成功至关重要,例如重新测序和从头进行基因组测序。对于高通量短读测序而言,这一点尤其重要,与传统的Sanger测序相比,短读序列更短,更丰富,错误更频繁。使用现有的纠错方法处理大量的短读操作会占用大量计算量和内存,但将其应用于实际数据集时,效果却不尽人意。结果:我们提出了一种称为Reptile的新颖方法,用于对短读数据进行纠错来自下一代测序。爬行动物使用输入读数中的k-mer谱进行工作,并通过同时检查以下内容来纠正错误:(i)基于汉明距离的潜在错误k-mers校正可能性; (ii)来自同一读取的相邻k-mers,以获取正确的上下文信息。通过不需要存储输入数据,Reptile具有良好的特性,即它可以处理不适合主存储器的数据。除序列数据外,爬行动物还可以利用可用的质量得分信息。我们的实验表明,在从数据中删除的错误百分比和真实碱基分配的准确性方面,爬行动物的性能优于以前的方法。此外,与以前的方法相比,已经大大减少了运行时间和内存使用量,这使得在对较大的基因组进行采样时进行短读错误校正更为实用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号