首页> 外文期刊>BMC Bioinformatics >Accel-Align: a fast sequence mapper and aligner based on the seed–embed–extend method
【24h】

Accel-Align: a fast sequence mapper and aligner based on the seed–embed–extend method

机译:Accel-align:基于种子嵌入延伸方法的快速序列映射器和对齐器

获取原文
获取外文期刊封面目录资料

摘要

Improvements in sequencing technology continue to drive sequencing cost towards $100 per genome. However, mapping sequenced data to a reference genome remains a computationally-intensive task due to the dependence on edit distance for dealing with INDELs and mismatches introduced by sequencing. All modern aligners use seed–filter–extend methodology and rely on filtration heuristics to reduce the overhead of edit distance computation. However, filtering has inherent performance–accuracy trade-offs that limits its effectiveness. Motivated by algorithmic advances in randomized low-distortion embedding, we introduce SEE, a new methodology for developing sequence mappers and aligners. While SFE focuses on eliminating sub-optimal candidates, SEE focuses instead on identifying optimal candidates. To do so, SEE transforms the read and reference strings from edit distance regime to the Hamming regime by embedding them using a randomized algorithm, and uses Hamming distance over the embedded set to identify optimal candidates. To show that SEE performs well in practice, we present Accel-Align?an SEE-based short-read sequence mapper and aligner that is 3–12 $$ imes$$ faster than state-of-the-art aligners on commodity CPUs, without any special-purpose hardware, while providing comparable accuracy. As sequencing technologies continue to increase read length while improving throughput and accuracy, we believe that randomized embeddings open up new avenues for optimization that cannot be achieved by using edit distance. Thus, the techniques presented in this paper have a much broader scope as they can be used for other applications like graph alignment, multiple sequence alignment, and sequence assembly.
机译:测序技术的改进继续以每种基因组达到100美元的排序成本。然而,由于对通过测序引入的indel和不匹配的编辑距离的依赖性,将测绘数据映射到参考基因组仍然是计算密集型任务。所有现代对齐器都使用种子过滤器延伸方法,依靠过滤启发式,以减少编辑距离计算的开销。但是,过滤具有固有的性能 - 准确性权衡,限制其有效性。通过随机低失真嵌入的算法进步激励,我们介绍了解序列映射器和对齐器的新方法。虽然SFE专注于消除次最优候选者,但请参阅侧重于识别最佳候选者。为此,请参阅通过使用随机算法嵌入它们来将读取和参考字符串从编辑距离制度转换为汉明制度,并使用嵌入式集上的汉明距来识别最佳候选。为了表明,在实践中表现良好,我们呈现Accel-Anign?基于SEE的短读序列映射器和对齐器,即3-12 $$ IMES $$的速度比商品CPU上的最先进的对齐器更快,没有任何专用硬件,同时提供可比的准确性。作为序列技术继续增加读取长度,同时提高吞吐量和准确性,我们认为随机嵌入式开辟了通过使用编辑距离无法实现的新途径。因此,本文呈现的技术具有更广泛的范围,因为它们可以用于图形对准,多个序列对准和序列组件等其他应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号