...
首页> 外文期刊>G3: Genes, Genomes, Genetics >Genetic Diversity Analysis of Highly Incomplete SNP Genotype Data with Imputations: An Empirical Assessment
【24h】

Genetic Diversity Analysis of Highly Incomplete SNP Genotype Data with Imputations: An Empirical Assessment

机译:带有估算的高度不完全SNP基因型数据的遗传多样性分析:一项实证评估

获取原文

摘要

Genotyping by sequencing (GBS) recently has emerged as a promising genomic approach for assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the uniquely large unbalance in GBS genotype data. Although some genotype imputation has been proposed to infer missing observations, little is known about the reliability of a genetic diversity analysis of GBS data, with up to 90% of observations missing. Here we performed an empirical assessment of accuracy in genetic diversity analysis of highly incomplete single nucleotide polymorphism genotypes with imputations. Three large single-nucleotide polymorphism genotype data sets for corn, wheat, and rice were acquired, and missing data with up to 90% of missing observations were randomly generated and then imputed for missing genotypes with three map-independent imputation methods. Estimating heterozygosity and inbreeding coefficient from original, missing, and imputed data revealed variable patterns of bias from assessed levels of missingness and genotype imputation, but the estimation biases were smaller for missing data without genotype imputation. The estimates of genetic differentiation were rather robust up to 90% of missing observations but became substantially biased when missing genotypes were imputed. The estimates of topology accuracy for four representative samples of interested groups generally were reduced with increased levels of missing genotypes. Probabilistic principal component analysis based imputation performed better in terms of topology accuracy than those analyses of missing data without genotype imputation. These findings are not only significant for understanding the reliability of the genetic diversity analysis with respect to large missing data and genotype imputation but also are instructive for performing a proper genetic diversity analysis of highly incomplete GBS or other genotype data.
机译:测序基因分型(GBS)最近已成为一种有前途的基因组方法,可用于评估全基因组范围内的遗传多样性。但是,人们并不缺少对GBS基因型数据独特的巨大失衡的关注。尽管已经提出了一些基因型推论来推断缺失的观测值,但对GBS数据的遗传多样性分析的可靠性知之甚少,多达90%的观测值缺失。在这里,我们对带有插补的高度不完全单核苷酸多态性基因型的遗传多样性分析的准确性进行了经验评估。获取了玉米,小麦和水稻的三个大型单核苷酸多态性基因型数据集,并随机生成了高达90%的缺失观察值的缺失数据,然后使用三种独立于图谱的插补方法估算了缺失的基因型。根据原始数据,缺失数据和估算数据估算杂合度和近交系数可从评估的缺失和基因型估算水平得出偏差的可变模式,但对于没有基因型估算的缺失数据,估算偏差较小。高达90%的缺失观察结果对遗传分化的估计是相当可靠的,但是当估算缺失的基因型时,遗传分化的估计就变得有很大的偏见。通常,随着缺失基因型水平的提高,对四个有代表性的代表性样本的拓扑准确度的估计会降低。基于概率主成分分析的插补在拓扑准确性方面比没有基因型插补的缺失数据分析表现更好。这些发现不仅对于理解关于大量缺失数据和基因型推算的遗传多样性分析的可靠性具有重要意义,而且对于指导对高度不完整的GBS或其他基因型数据进行适当的遗传多样性分析也具有指导意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号