首页> 外文期刊>PLoS Genetics >Using Whole-Genome Sequence Data to Predict Quantitative Trait Phenotypes in Drosophila melanogaster
【24h】

Using Whole-Genome Sequence Data to Predict Quantitative Trait Phenotypes in Drosophila melanogaster

机译:利用全基因组序列数据预测<果蝇>果蝇的数量性状表型

获取原文
       

摘要

Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster , using ~2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP–based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms. Author Summary The ability to accurately predict values of complex phenotypes from genotype data will revolutionize plant and animal breeding, personalized medicine, and evolutionary biology. To date, genomic prediction has utilized high-density single-nucleotide polymorphism (SNP) genotyping arrays, but the availability of sequence data opens new frontiers for genomic prediction methods. This article is the first application of genomic phenotype prediction using whole-genome sequence data in a substantial sample of a higher eukaryote. We use ~2.5 million SNPs with minor allele frequency greater than 2.5% derived from genomic sequences of the “Drosophila Genetic Reference Panel” to predict phenotypes for two traits, starvation resistance and startle-induced locomotor behavior. We systematically address prediction within versus across sexes, genomic best linear unbiased prediction (GBLUP) versus a Bayesian approach, and the effect of SNP density. We find that (i) genomic prediction can be efficiently implemented using sequence data via GBLUP, (ii) there is little gain in predictive ability if the number of SNPs is increased above 150,000, and (iii) neither implicit nor explicit marker selection substantially improves the predictive ability. Although the findings must be seen against the background of small sample sizes, the results illustrate both the potential of the approach and the challenges ahead.
机译:从基因型数据预测生物表型对于动植物育种,医学和进化生物学很重要。基于基因组的表型预测已应用于单核苷酸多态性(SNP)基因分型平台,但未使用完整的基因组序列。在这里,我们报道了果蝇遗传参考组对近交系进行测序后确定的约250万个SNP,从而对果蝇的饥饿胁迫抗性和惊吓反应进行了基因组预测。我们从SNP数据构建了基因组关系矩阵,并将其用于基因组最佳线性无偏预测(GBLUP)模型中。我们通过交叉验证评估了预测能力,作为预测遗传值和观察到的表型之间的相关性,发现饥饿抵抗(惊吓反应)的预测能力为0.239±0.008(0.230±0.012)。具有内部SNP选择的贝叶斯方法BayesB的预测能力不大于GBLUP。选择具有最高绝对效应或方差的5%SNP并不能改善预测能力。仅当少于150,000个SNP用于构建基因组关系矩阵时,预测能力才会下降。我们假设该人群的预测能力源自基于SNP的由长距离连锁不平衡引起的微妙关系结构的建模,而不是源自人口结构或带有因果变异的连锁不平衡中的SNP。我们讨论了这些结果对其他生物的基因组预测的影响。作者总结从基因型数据准确预测复杂表型值的能力将彻底改变动植物育种,个性化医学和进化生物学。迄今为止,基因组预测已利用高密度单核苷酸多态性(SNP)基因分型阵列,但是序列数据的可用性为基因组预测方法开辟了新的领域。本文是在高等真核生物的大量样品中使用全基因组序列数据进行基因组表型预测的首次应用。我们使用来自“果蝇遗传参考小组”的基因组序列的约250万个SNP,其次要等位基因频率大于2.5%来预测饥饿,抗性和惊吓引起的运动行为这两个性状的表型。我们系统地解决了男女之间的内在预测,基因组最佳线性无偏预测(GBLUP)与贝叶斯方法之间的关系,以及SNP密度的影响。我们发现(i)通过GBLUP使用序列数据可以有效地进行基因组预测,(ii)如果SNP的数量增加到150,000以上,则预测能力几乎没有提高,并且(iii)隐性或显性标记选择均无明显改善预测能力。尽管必须在小样本量的背景下看待发现的结果,但结果说明了该方法的潜力和未来的挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号