首页> 外文期刊>BMC Bioinformatics >An efficient error correction algorithm using FM-index
【24h】

An efficient error correction algorithm using FM-index

机译:使用FM-index的高效纠错算法

获取原文
       

摘要

High-throughput sequencing offers higher throughput and lower cost for sequencing a genome. However, sequencing errors, including mismatches and indels, may be produced during sequencing. Because, errors may reduce the accuracy of subsequent de novo assembly, error correction is necessary prior to assembly. However, existing correction methods still face trade-offs among correction power, accuracy, and speed. We develop a novel overlap-based error correction algorithm using FM-index (called FMOE). FMOE first identifies overlapping reads by aligning a query read simultaneously against multiple reads compressed by FM-index. Subsequently, sequencing errors are corrected by k-mer voting from overlapping reads only. The experimental results indicate that FMOE has highest correction power with comparable accuracy and speed. Our algorithm performs better in long-read than short-read datasets when compared with others. The assembly results indicated different algorithms has its own strength and weakness, whereas FMOE is good for long or good-quality reads. FMOE is freely available at https://github.com/ythuang0522/FMOC .
机译:高通量测序可为基因组测序提供更高的通量和更低的成本。但是,测序过程中可能会产生测序错误,包括错配和插入缺失。因为错误可能会降低后续重新组装的准确性,所以在组装之前必须进行错误校正。但是,现有的校正方法仍然面临校正能力,准确性和速度之间的权衡。我们使用FM索引(称为FMOE)开发了一种基于重叠的新型纠错算法。 FMOE首先通过将查询读取同时与FM-index压缩的多个读取对齐来识别重叠的读取。随后,通过k-mer投票仅从重叠的读段中纠正了测序错误。实验结果表明,FMOE具有最高的校正能力,并且具有可比的精度和速度。与其他数据集相比,我们的算法在长数据集上的表现优于短数据集。汇编结果表明,不同的算法各有优缺点,而FMOE则适合长时或高质量读取。 FMOE可从https://github.com/ythuang0522/FMOC免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号