...
首页> 外文期刊>BMC Bioinformatics >A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
【24h】

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

机译:在低覆盖率作物数据集上按序列进行基因分型分析方法的比较显示了新工作流程GB-eaSy的优势

获取原文

摘要

Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data.
机译:测序基因分型法(GBS)是一种鉴定遗传变异和快速基因型样品的方法,它通过使用限制酶将基因组分成末端在短读测序平台上测序的片段来降低基因组的复杂性。这种方法虽然具有成本效益,但会产生大量丢失的数据,并且需要进行复杂的生物信息学分析。 GBS最常用于农作物基因组,并且由于农作物具有高度可变的倍性和重复含量,因此GBS分析软件的性能可能因目标生物而异。在这里,我们将分析重点放在大豆上,即具有高度重复的基因组,相对较少的公共GBS数据和专用工具很少的多倍体作物。我们使用来自三个大豆种群的低覆盖率Illumina序列数据比较了五个GBS管道的性能。为了解决现有方法中发现的问题,我们开发了GB-eaSy,这是一个GBS生物信息学工作流程,它结合了广泛使用的基因组学工具,并行化和自动化功能,以提高GBS数据分析的准确性和可访问性。与其他GBS管线相比,GB-eaSy可以快速准确地识别出最大数量的SNP,SNP调用与所选株系的全基因组测序密切相关。在所有五个GBS分析平台上,SNP调用显示出出乎意料的低收敛性,但总体上具有较高的准确性,这表明工作流在分析的低覆盖率数据上得出了有效SNP调用的大部分互补集。我们证明,根据大豆的低覆盖率基因组数据测试,GB-eaSy在变异调用的准确性,产量和缺失数据部分方面与其他领先的软件解决方案大致相同或更好。就所需的运行时间和磁盘空间而言,它相对于其他解决方案也表现良好。此外,GB-eaSy是从现有的开源,模块化软件包构建的,这些软件包会定期更新和常用,从而使其易于安装和维护。尽管GB-eaSy在分析的数据集上胜过其他单个方法,但我们的发现表明,将来自多个GBS生物信息学管道的结果整合在一起的综合方法可能是从低覆盖率多倍体序列中获得最大,最高精度SNP产量的最佳策略。数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号