首页> 美国卫生研究院文献>PLoS Clinical Trials >Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies
【2h】

Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies

机译:通过测序(GBS)数据进行基因分型的全基因组SNP调用:七个管道和两种测序技术的比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Next-generation sequencing (NGS) has revolutionized plant and animal research in many ways including new methods of high throughput genotyping. Genotyping-by-sequencing (GBS) has been demonstrated to be a robust and cost-effective genotyping method capable of producing thousands to millions of SNPs across a wide range of species. Undoubtedly, the greatest barrier to its broader use is the challenge of data analysis. Herein we describe a comprehensive comparison of seven GBS bioinformatics pipelines developed to process raw GBS sequence data into SNP genotypes. We compared five pipelines requiring a reference genome (TASSEL-GBS v1& v2, Stacks, IGST, and Fast-GBS) and two de novo pipelines that do not require a reference genome (UNEAK and Stacks). Using Illumina sequence data from a set of 24 re-sequenced soybean lines, we performed SNP calling with these pipelines and compared the GBS SNP calls with the re-sequencing data to assess their accuracy. The number of SNPs called without a reference genome was lower (13k to 24k) than with a reference genome (25k to 54k SNPs) while accuracy was high (92.3 to 98.7%) for all but one pipeline (TASSEL-GBSv1, 76.1%). Among pipelines offering a high accuracy (>95%), Fast-GBS called the greatest number of polymorphisms (close to 35,000 SNPs + Indels) and yielded the highest accuracy (98.7%). Using Ion Torrent sequence data for the same 24 lines, we compared the performance of Fast-GBS with that of TASSEL-GBSv2. It again called more polymorphisms (25.8K vs 22.9K) and these proved more accurate (95.2 vs 91.1%). Typically, SNP catalogues called from the same sequencing data using different pipelines resulted in highly overlapping SNP catalogues (79–92% overlap). In contrast, overlap between SNP catalogues obtained using the same pipeline but different sequencing technologies was less extensive (~50–70%).
机译:下一代测序(NGS)已在许多方面革新了动植物研究,包括高通量基因分型的新方法。测序基因分型(GBS)已被证明是一种可靠且具有成本效益的基因分型方法,能够在广泛的物种中产生数千至数百万个SNP。无疑,对其广泛使用的最大障碍是数据分析的挑战。在这里,我们描述了七种GBS生物信息学流水线的全面比较,这些流水线旨在将原始GBS序列数据处理为SNP基因型。我们比较了五个需要参考基因组的管道(TASSEL-GBS v1和v2,Stacks,IGST和Fast-GBS)和两个不需要参考基因组的从头构建管道(UNEAK和Stacks)。使用来自一组24个重新排序的大豆品系的Illumina序列数据,我们对这些管道执行了SNP调用,并将GBS SNP调用与重新测序数据进行了比较,以评估其准确性。没有参考基因组的SNP数量(13k至24k)低于参考基因组(25k至54k SNP),而除一条外,其他所有管线(TASSEL-GBSv1,76.1%)的准确性都很高(92.3至98.7%)。 。在提供高精度(> 95%)的管线中,Fast-GBS称为最多的多态性(接近35,000个SNP +插入/缺失),并且产生的精度最高(98.7%)。使用相同24条线的离子激流序列数据,我们比较了Fast-GBS和TASSEL-GBSv2的性能。它再次调用了更多的多态性(25.8K和22.9K),事实证明这些更加准确(95.2对91.1%)。通常,使用不同管线从相同测序数据调用的SNP目录导致高度重叠的SNP目录(重叠79–92%)。相反,使用相同管道但使用不同测序技术获得的SNP目录之间的重叠范围较小(〜50–70%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号