首页> 美国卫生研究院文献>PLoS Clinical Trials >GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets
【2h】

GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets

机译:GenoCore:从大型基因型数据集中选择核心子集的简单快速算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Selecting core subsets from plant genotype datasets is important for enhancing cost-effectiveness and to shorten the time required for analyses of genome-wide association studies (GWAS), and genomics-assisted breeding of crop species, etc. Recently, a large number of genetic markers (>100,000 single nucleotide polymorphisms) have been identified from high-density single nucleotide polymorphism (SNP) arrays and next-generation sequencing (NGS) data. However, there is no software available for picking out the efficient and consistent core subset from such a huge dataset. It is necessary to develop software that can extract genetically important samples in a population with coherence. We here present a new program, GenoCore, which can find quickly and efficiently the core subset representing the entire population. We introduce simple measures of coverage and diversity scores, which reflect genotype errors and genetic variations, and can help to select a sample rapidly and accurately for crop genotype dataset. Comparison of our method to other core collection software using example datasets are performed to validate the performance according to genetic distance, diversity, coverage, required system resources, and the number of selected samples. GenoCore selects the smallest, most consistent, and most representative core collection from all samples, using less memory with more efficient scores, and shows greater genetic coverage compared to the other software tested. GenoCore was written in R language, and can be accessed online with an example dataset and test results at .
机译:从植物基因型数据集中选择核心子集对于提高成本效益和缩短分析全基因组关联研究(GWAS)和农作物的基因组学辅助育种等所需的时间很重要。近来,大量的遗传已从高密度单核苷酸多态性(SNP)阵列和下一代测序(NGS)数据中鉴定了标记(> 100,000个单核苷酸多态性)。但是,没有可用的软件从如此庞大的数据集中挑选出高效且一致的核心子集。有必要开发一种可以一致地提取具有遗传学意义的样本的软件。我们在这里提出了一个新的程序GenoCore,它可以快速有效地找到代表整个人群的核心子集。我们介绍了覆盖率和多样性得分的简单度量,这些度量反映了基因型错误和遗传变异,可以帮助快速,准确地为农作物基因型数据集选择样本。我们使用示例数据集将我们的方法与其他核心收集软件进行了比较,以根据遗传距离,多样性,覆盖范围,所需的系统资源以及所选样本的数量来验证性能。 GenoCore从所有样品中选择最小,最一致和最具代表性的核心样品,与其他经过测试的软件相比,使用更少的内存和更有效的分数,并显示出更大的基因覆盖率。 GenoCore是用R语言编写的,可以在上通过示例数据集和测试结果在线访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号