首页> 美国卫生研究院文献>Journal of Animal Science >The impact of clustering methods for cross-validation choice of phenotypes and genotyping strategies on the accuracy of genomic predictions
【2h】

The impact of clustering methods for cross-validation choice of phenotypes and genotyping strategies on the accuracy of genomic predictions

机译:交叉验证表型选择和基因分型策略的聚类方法对基因组预测准确性的影响

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

For genomic predictors to be of use in genetic evaluation, their predicted accuracy must be a reliable indicator of their utility, and thus unbiased. The objective of this paper was to evaluate the accuracy of prediction of genomic breeding values ( ) using different clustering strategies and response variables. Red Angus genotypes ( = 9,763) were imputed to a reference 50K panel. The influence of clustering method [k-means, k-medoids, principal component ( ) analysis on the numerator relationship matrix ( ) and the identical-by-state genomic relationship matrix ( ) as both data and covariance matrices, and random] and response variables [deregressed estimated breeding values ( ) and adjusted phenotypes] were evaluated for cross-validation. The GBV were estimated using a Bayes C model for all traits. Traits for DEBV included birth weight ( ), marbling ( ), rib-eye area ), and yearling weight ( ). Adjusted phenotypes included BWT, YWT, and ultrasonically measured intramuscular fat percentage and REA. Prediction accuracies were estimated using the genetic correlation between GBV and associated response variable using a bivariate animal model. A simulation mimicking a cattle population, replicated 5 times, was conducted to quantify differences between true and estimated accuracies. The simulation used the same clustering methods and response variables, with the addition of 2 genotyping strategies (random and top 25% of individuals), and forward validation. The prediction accuracies were estimated similarly, and true accuracies were estimated as the correlation between the residuals of a bivariate model including true breeding value ( ) and GBV. Using the adjusted Rand index, random clusters were clearly different from relationship-based clustering methods. In both real and simulated data, random clustering consistently led to the largest estimates of accuracy, while no method was consistently associated with more or less bias than other methods. In simulation, random genotyping led to higher estimated accuracies than selection of the top 25% of individuals. Interestingly, random genotyping seemed to overpredict true accuracy while selective genotyping tended to underpredict accuracy. When forward in time validation was used, DEBV led to less biased estimates of GBV accuracy. Results suggest the highest, least biased GBV accuracies are associated with random genotyping and DEBV.
机译:为了使基因组预测因子可用于遗传评估,其预测准确性必须是其效用的可靠指标,因此必须无偏见。本文的目的是评估使用不同聚类策略和响应变量的基因组育种值预测的准确性。将红色安格斯基因型(= 9,763)输入到参考50K样本中。聚类方法[k-均值,k-medoids,主成分()分析对分子关系矩阵()和作为数据和协方差矩阵以及随机的相同状态基因组关系矩阵()的影响和响应对变量[退化的估计育种值()和调整的表型]进行评估以进行交叉验证。使用贝叶斯C模型估计所有性状的GBV。 DEBV的特征包括出生体重(),大理石花纹(),肋眼面积)和一岁体重()。调整的表型包括BWT,YWT和超声测量的肌内脂肪百分比和REA。使用双变量动物模型,使用GBV和相关反应变量之间的遗传相关性来估计预测准确性。进行了模拟,模拟了牛群,重复了5次,以量化真实准确度和估计准确度之间的差异。模拟使用相同的聚类方法和响应变量,并添加了两种基因分型策略(随机且前25%的个体),并进行正向验证。预测准确性的估算方法相似,真实准确性的估算方法是:双变量模型的残差(包括真实育种值()和GBV)之间的相关性。使用调整后的兰德指数,随机聚类明显不同于基于关系的聚类方法。在真实和模拟数据中,随机聚类始终导致对准确性的最大估计,而没有一种方法比其他方法始终具有或多或少的偏差。在模拟中,随机基因分型比选择前25%的个体具有更高的估计准确性。有趣的是,随机基因分型似乎高估了真实的准确性,而选择性基因分型却往往低估了准确性。当使用时间前向验证时,DEBV导致对GBV准确性的估计偏差较小。结果表明,GBV准确性最高,偏差最小的与随机基因分型和DEBV相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号