首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >GSDcreator: An Efficient and Comprehensive Simulator for Genarating NGS Data with Population Genetic Information
【24h】

GSDcreator: An Efficient and Comprehensive Simulator for Genarating NGS Data with Population Genetic Information

机译:GSFcreator:一种具有种群遗传信息的NGS数据的高效综合模拟器

获取原文

摘要

In recent decades, NGS data analysis has become a major research field in bioinformatics, which presents great advantages in many application scenarios. Many algorithms and software were designed for analyzing the NGS data, while simulation datasets are urgently needed for testing software and optimizing their parameter configurations. Thus, a series of NGS data simulators have been published. However, the existing simulators cannot satisfy the requirements from many specific scenarios. First, they do not support many newly discovered variations. Second, complex structural variations are difficult to generate. In addition, along with the increase of population data, it is urgent to increase population information simulation. In this paper, we propose GSDcreator, a comprehensive NGS simulator that overcome the three weaknesses mentioned above. It can produce all known types of variation, where the complex of variations are also supported. Furthermore, it can capture many important real data features including population polymorphism, insert size distribution, adjacent site depth distribution, overall depth distribution, quality score distribution, amplification bias, sequencing errors and so on. It's highlighted that 1000 Genomes Project Database is taken as a reference and integrates population genetic information to simulate population polymorphism. To test the performance, we did a lot of experiments and found that simulated data produced by GSDcreator are quit mimic to the real sequencing data.
机译:近几十年来,NGS数据分析已成为生物信息学的主要研究领域,在许多应用场景中都具有很大的优势。设计了许多算法和软件来分析NGS数据,而迫切需要仿真数据集来测试软件和优化其参数配置。因此,已经发布了一系列NGS数据模拟器。但是,现有的模拟器无法满足许多特定情况下的要求。首先,它们不支持许多新发现的变体。第二,复杂的结构变化很难产生。另外,随着人口数据的增加,迫切需要增加人口信息模拟。在本文中,我们提出了GSDcreator,这是一个全面的NGS模拟器,可以克服上述三个缺点。它可以产生所有已知类型的变体,其中还支持变体的复杂性。此外,它可以捕获许多重要的真实数据特征,包括种群多态性,插入片段大小分布,相邻位点深度分布,总体深度分布,质量得分分布,扩增偏差,测序错误等。重点介绍了“ 1000个基因组计划数据库”作为参考,它整合了种群遗传信息以模拟种群多态性。为了测试性能,我们做了很多实验,发现GSDcreator产生的模拟数据与真实的测序数据完全不一样。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号