首页> 外文会议>International workshop on algorithms in bioinformatics >Efficient Local Alignment Discovery amongst Noisy Long Reads
【24h】

Efficient Local Alignment Discovery amongst Noisy Long Reads

机译:嘈杂的长读中的高效本地比对发现

获取原文

摘要

Long read sequencers portend the possibility of producing reference quality genomes not only because the reads are long, but also because sequencing errors and read sampling are almost perfectly random. However, the error rates are as high as 15%, necessitating an efficient algorithm for finding local alignments between reads at a 30% difference rate, a level that current algorithm designs cannot handle or handle inefficiently. In this paper we present a very efficient yet highly sensitive, threaded filter, based on a novel sort and merge paradigm, that proposes seed points between pairs of reads that are likely to have a significant local alignment passing through them. We also present a linear expected-time heuristic based on the classic O(nd) difference algorithm that finds a local alignment passing through a seed point that is exceedingly sensitive, failing but once every billion base pairs. These two results have been combined into a software program we call DALIGN that realizes the fastest program to date for finding overlaps and local alignments in very noisy long read DNA sequencing data sets and is thus a prelude to de novo long read assembly.
机译:长读测序仪预示着产生参考质量基因组的可能性,这不仅是因为读长,而且因为测序错误和读采样几乎是完全随机的。但是,错误率高达15%,这就需要一种有效的算法来以30%的差异率找到读段之间的局部比对,这是当前算法设计无法有效处理或处理的水平。在本文中,我们基于一种新颖的排序和合并范例,提出了一种非常高效但高度敏感的线程过滤器,该过滤器提出了在成对的读取之间的种子点,这些读取点之间可能会有明显的局部比对。我们还提出了一种基于经典O(nd)差分算法的线性预期时间启发式算法,该算法可找到通过极敏感的种子点的局部比对,但失败的概率为十亿个碱基对一次。这两个结果已合并到一个称为DALIGN的软件程序中,该程序实现了迄今为止最快的程序,可用于在非常嘈杂的长读DNA测序数据集中查找重叠和局部比对,因此是从头开始进行长读装配的序幕。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号