...
首页> 外文期刊>BMC Genomics >Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
【24h】

Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data

机译:高通量测序中使用的作图算法比较:应用于离子激流数据

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform.
机译:背景技术高通量测序(HTS)技术的快速发展为几个研究领域开辟了新的视角,并导致产生大量的序列数据。 HTS数据分析的基本步骤是将读取映射到参考序列。为给定技术和给定应用选择合适的映射器是一项微妙的任务,因为难以评估映射算法。结果在本文中,我们提出了一个基准程序,用于比较使用实际数据集和模拟数据集的HTS中使用的映射算法,并考虑四个评估标准:计算资源和时间要求,映射的鲁棒性,报告重复区域中读取位置的能力以及检索真实遗传变异位置的能力。为了测量鲁棒性,我们为正确映射的读取引入了新的定义,不仅考虑了读取的预期起始位置,而且考虑了终止位置以及插入/替换的数目。我们开发了一种新的读取模拟器CuReSim,它能够通过将参数调整为错误类型来为任何一种HTS技术生成自定义的基准数据。可免费获得CuReSim和CuReSimEval(一种评估CuReSim模拟读取的映射质量的工具)。我们使用基准程序在具有离子洪流数据的小型基因组全基因组测序的背景下评估了14个作图者,但尚未建立此类比较。结论引入了一个比较HTS数据映射器的基准程序,并为映射正确性定义了新定义,并提供了生成模拟读数和评估映射质量的工具。将该程序应用于小基因组全基因组测序中的离子激流数据,使我们能够验证基准程序并证明它有助于根据预期的应用,要解决的问题和所使用的技术选择绘图仪。此基准过程可用于评估现有或开发中的映射器,以及针对任何应用程序和任何测序平台优化所选映射器的参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号