首页> 外文期刊>BMC Bioinformatics >EC: an efficient error correction algorithm for short reads
【24h】

EC: an efficient error correction algorithm for short reads

机译:EC:短读的高效纠错算法

获取原文
           

摘要

Background In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trimming and by correcting the erroneous bases of the reads. It helps to achieve high quality data and the computational complexity of many biological applications will be greatly reduced if the reads are first corrected. We have developed a novel error correction algorithm called EC and compared it with four other state-of-the-art algorithms using both real and simulated sequencing reads. Results We have done extensive and rigorous experiments that reveal that EC is indeed an effective, scalable, and efficient error correction tool. Real reads that we have employed in our performance evaluation are Illumina-generated short reads of various lengths. Six experimental datasets we have utilized are taken from sequence and read archive (SRA) at NCBI. The simulated reads are obtained by picking substrings from random positions of reference genomes. To introduce errors, some of the bases of the simulated reads are changed to other bases with some probabilities. Conclusions Error correction is a vital problem in biology especially for NGS data. In this paper we present a novel algorithm, called Error Corrector (EC) , for correcting substitution errors in biological sequencing reads. We plan to investigate the possibility of employing the techniques introduced in this research paper to handle insertion and deletion errors also. Software availability The implementation is freely available for non-commercial purposes. It can be downloaded from: http://?engr.?uconn.?edu/?~rajasek/?EC.?zip .
机译:背景技术在高度并行的下一代测序(NGS)技术中,单次运行即可从基因组序列中产生数百万至数十亿的短读。由于NGS技术的局限性,读取中可能有错误。可以通过修整和校正错误的碱基来降低读取的错误率。如果先校正读数,则有助于获得高质量的数据,并且将大大降低许多生物应用程序的计算复杂性。我们开发了一种新颖的纠错算法,称为EC,并将它与使用实际和模拟测序读取的其他四种最新算法进行了比较。结果我们进行了广泛而严格的实验,这些结果表明EC确实是一种有效,可扩展且有效的纠错工具。我们在性能评估中采用的真实读物是Illumina生成的各种长度的短读物。我们利用的六个实验数据集来自NCBI的序列和读取档案(SRA)。通过从参考基因组的随机位置选择子串来获得模拟的读数。为了引入错误,将模拟读取的某些基础更改为具有某些概率的其他基础。结论纠错是生物学中的重要问题,尤其是对于NGS数据而言。在本文中,我们提出了一种称为纠错器(EC)的新颖算法,用于纠正生物测序读数中的替换错误。我们计划调查采用本研究论文中介绍的技术来处理插入和删除错误的可能性。软件可用性该实现可免费用于非商业目的。可以从以下网址下载:http://?engr。?uconn。?edu /?〜rajasek /?EC。?zip。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号