首页> 美国卫生研究院文献>other >MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping
【2h】

MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping

机译:MOSAIK:一种基于哈希的算法用于精确的下一代测序短读映射

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO ().
机译:MOSAIK是一个稳定,敏感的开源程序,用于将第二代和第三代测序读图映射到参考基因组。 MOSAIK可在当前的绘图工具中独树一帜,可以比对所有主要测序技术(包括Illumina,Applied Biosystems SOLiD,Roche 454,Ion Torrent和Pacific BioSciences SMRT)生成的读数。实际上,MOSAIK是唯一为1000个基因组计划中的所有生成数据(测序技术,低覆盖率和外显子组)提供一致映射的比对仪。为了提供高度准确的对齐方式,MOSAIK采用了哈希聚类策略和Smith-Waterman算法。此方法非常适合捕获不匹配以及短插入和缺失。为了支持人们对更大的结构变异(SV)发现的兴趣,MOSAIK为处理已知序列的SV提供了明确的支持,例如移动元素插入(MEI)以及生成量身定制的输出以帮助SV发现。所有变体发现都受益于对读取放置置信度的准确描述。为此,MOSAIK使用基于神经网络的训练方案来提供经过良好校准的映射质量得分,这由MOSAIK分配的与实际映射质量之间的相关系数大于0.98来证明。为了确保支持任何基因组的研究,提供了一个培训管道以确保所研究基因组的最佳作图质量得分。 MOSAIK是多线程的开放源代码,并已集成到我们的命令和管道启动器系统GKNO()中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号