首页> 外文OA文献 >A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
【2h】

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

机译:低覆盖作物数据集对基因分序分析方法的比较显示了新工作流程的优势,GB-Easy

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Abstract Background Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. Results We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. Conclusions We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data.
机译:摘要背景基因分型逐序列(GBS),一种鉴定遗传变体和快速基因型样品的方法,通过使用限制酶将基因组分成末端在短读入序列平台上测序的片段来降低基因组复杂性。虽然具有成本效益,但这种方法产生了广泛的缺失数据,并且需要复杂的生物信息学分析。 GBS最常用于作物植物基因组,因为作物植物具有高度可变的倍率和重复含量,因此GBS分析软件的性能可以因靶生物而变化。在这里,我们将对大豆的分析集中在大豆,一种具有高复杂基因组,相对较少的公共GBS数据和少数专用工具的多倍体作物。结果我们使用来自三种大豆群体的低覆盖率illumina序列数据比较了五个GBS管道的性能。为了解决现有方法标识的问题,我们开发了GB - Easy,GBS生物信息学工作流程,包括广泛使用的基因组学工具,并行化和自动化,以提高GBS数据分析的准确性和可访问性。与其他GBS管道相比,GB易于快速准确地识别最多的SNP,SNP呼叫与所选线路的全基因组测序密切合作。在所有五个GBS分析平台上,SNP呼叫都会出现意外的低收敛性,但通常高精度,表明工作流程在很大程度上到达了在分析的低覆盖数据上的有效SNP呼叫的互补集合。结论我们表明GB-Easy比其他领先的软件解决方案大致良好,或者更好地是在来自大豆的低覆盖基因组数据上测试的最精确,产量和缺失数据分数。在需要的时间和磁盘空间方面,它还相对于其他解决方案执行良好。此外,GB-Easy是由现有的开源,模块化软件包构建,这些软件包定期更新和常用,使安装和维护简单。虽然GB易于表现出在分析的数据集上的其他单独方法,但我们的研究结果表明,综合方法集成了多个GBS生物信息学管道的结果,可以是获得来自低覆盖多倍体序列的最大,最高精度的SNP产量的最佳策略数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号