首页> 美国卫生研究院文献>Bioinformatics >Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art
【2h】

Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art

机译:检测长读重叠的创新与挑战:评估最先进的

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Identifying overlaps between error-prone long reads, specifically those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB), is essential for certain downstream applications, including error correction and de novo assembly. Though akin to the read-to-reference alignment problem, read-to-read overlap detection is a distinct problem that can benefit from specialized algorithms that perform efficiently and robustly on high error rate long reads. Here, we review the current state-of-the-art read-to-read overlap tools for error-prone long reads, including BLASR, DALIGNER, MHAP, GraphMap and Minimap. These specialized bioinformatics tools differ not just in their algorithmic designs and methodology, but also in their robustness of performance on a variety of datasets, time and memory efficiency and scalability. We highlight the algorithmic features of these tools, as well as their potential issues and biases when utilizing any particular method. To supplement our review of the algorithms, we benchmarked these tools, tracking their resource needs and computational performance, and assessed the specificity and precision of each. In the versions of the tools tested, we observed that Minimap is the most computationally efficient, specific and sensitive method on the ONT datasets tested; whereas GraphMap and DALIGNER are the most specific and sensitive methods on the tested PB datasets. The concepts surveyed may apply to future sequencing technologies, as scalability is becoming more relevant with increased sequencing throughput. >Supplementary information: are available at Bioinformatics online.
机译:识别容易出错的长读之间的重叠,特别是牛津纳米孔技术(ONT)和太平洋生物科学(PB)的长读之间的重叠,对于某些下游应用(包括错误校正和从头组装)至关重要。尽管类似于读取到参考的对齐问题,读取到读取的重叠检测还是一个独特的问题,可以从专门的算法中受益,该算法可以在高错误率长读取中高效而强大地执行。在这里,我们回顾了易于出错的长读的最新技术,如BLASR,DALIGNER,MHAP,GraphMap和Minimap。这些专门的生物信息学工具不仅在算法设计和方法上不同,而且在各种数据集上的性能鲁棒性,时间和内存效率以及可伸缩性也不同。我们将重点介绍这些工具的算法功能,以及在使用任何特定方法时它们的潜在问题和偏见。为了补充对算法的评论,我们对这些工具进行了基准测试,跟踪了它们的资源需求和计算性能,并评估了每种工具的特异性和准确性。在测试的工具版本中,我们观察到Minimap是在测试的ONT数据集上计算效率最高,最具体,最敏感的方法。而GraphMap和DALIGNER是测试的PB数据集上最具体,最敏感的方法。调查的概念可能适用于未来的测序技术,因为随着测序通量的增加,可伸缩性变得越来越重要。 >补充信息:可在线访问生物信息学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号