首页> 外文期刊>Journal of dairy science >Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups
【24h】

Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups

机译:模拟奶牛校正组中不同疾病发病率和基因组结构的疾病易感性基因组育种值的随机森林估计

获取原文
获取原文并翻译 | 示例
       

摘要

A simulation study was conducted to investigate the performance of random forest (RF) and genomic BLUP (GBLUP) for genomic predictions of binary disease traits based on cow calibration groups. Training and testing sets were modified in different scenarios according to disease incidence, the quantitative-genetic background of the trait (h~2 = 0.30 and h~2 = 0.10), and the genomic architecture [725 quantitative trait loci (QTL) and 290 QTL, populations with high and low levels of linkage disequilibrium (LD)]. For all scenarios, 10,005 SNP (depicting a low-density 10K SNP chip) and 50,025 SNP (depicting a 50K SNP chip) were evenly spaced along 29 chromosomes. Training and testing sets included 20,000 cows (4,000 sick, 16,000 healthy, disease incidence 20%) from the last 2 generations. Initially, 4,000 sick cows were assigned to the testing set, and the remaining 16,000 healthy cows represented the training set. In the ongoing allocation schemes, the number of sick cows in the training set increased stepwise by moving 10% of the sick animals from the testing set to the training set, and vice versa. The size of the training and testing sets was kept constant. Evaluation criteria for both GBLUP and RF were the correlations between genomic breeding values and true breeding values (prediction accuracy), and the area under the receiving operating characteristic curve (AUROC). Prediction accuracy and AUROC increased for both methods and all scenarios as increasing percentages of sick cows were allocated to the training set. Highest prediction accuracies were observed for disease incidences in training sets that reflected the population disease incidence of 0.20. For this allocation scheme, the largest prediction accuracies of 0.53 for RF and of 0.51 for GBLUP, and the largest AUROC of 0.66 for RF and of 0.64 for GBLUP, were achieved using 50,025 SNP, a heritability of 0.30, and 725 QTL. Heritability decreases from 0.30 to 0.10 and QTL reduction from 725 to 290 were associated with decreasing prediction accuracy and decreasing AUROC for all scenarios. This decrease was more pronounced for RF. Also, the increase of LD had stronger effect on RF results than on GBLUP results. The highest prediction accuracy from the low LD scenario was 0.30 from RF and 0.36 from GBLUP, and increased to 0.39 for both methods in the high LD population. Random forest successfully identified important SNP in close map distance to QTL explaining a high proportion of the phenotypic trait variations.
机译:进行了模拟研究,以研究基于母牛校正组的随机森林(RF)和基因组BLUP(GBLUP)对二元疾病性状的基因组预测的性能。根据疾病发生率,性状的定量遗传背景(h〜2 = 0.30和h〜2 = 0.10)以及基因组结构[725个定量性状位点(QTL)和290个],在不同情况下修改了训练和测试集。 QTL,连锁不平衡水平高和低的人群]。在所有情况下,沿着29条染色体均匀地间隔着10005个SNP(描述了一个低密度的10K SNP芯片)和50025个SNP(描述了一个50K SNP芯片)。培训和测试集包括最近2代的20,000头母牛(4,000头病牛,16,000头健康牛,疾病发生率20%)。最初,将4,000头病牛分配给测试集,其余16,000头健康母牛代表训练集。在正在进行的分配方案中,通过将10%的患病动物从测试集中移动到训练集中,训练集中的病牛数量逐步增加,反之亦然。训练和测试集的大小保持不变。 GBLUP和RF的评估标准是基因组育种值和真实育种值(预测准确性)之间的相关性,以及接收工作特征曲线(AUROC)下的面积。两种方法和所有方案的预测准确性和AUROC都增加了,因为将病牛的百分比分配给了训练集。在训练集中观察到疾病发生率的最高预测准确度,反映出该群体疾病的发生率为0.20。对于此分配方案,使用50,025 SNP,0.30的遗传力和725 QTL实现了RF的最大预测精度为0.53,GBLUP的预测精度为0.51,RF的最大AUROC值为0.66,GBLUP的最大AUROC精度为0.64。遗传率从0.30降低到0.10,QTL从725降低到290,与所有情况下的预测准确度降低和AUROC降低有关。对于RF来说,这种下降更为明显。而且,LD的增加对RF结果的影响比对GBLUP结果的影响更大。来自低LD场景的最高预测准确度是RF的0.30和GBLUP的0.36,在高LD人群中,两种方法的预测准确性均提高到0.39。随机森林成功地在距QTL较近的图谱距离中识别出重要的SNP,从而解释了很大比例的表型性状变异。

著录项

  • 来源
    《Journal of dairy science》 |2016年第9期|7261-7273|共13页
  • 作者

    S. Naderi; T. Yin; S. Koenig;

  • 作者单位

    Department of Animal Breeding, University of Kassel, 37213 Witzenhausen, Germany;

    Department of Animal Breeding, University of Kassel, 37213 Witzenhausen, Germany;

    Department of Animal Breeding, University of Kassel, 37213 Witzenhausen, Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《生物学医学文摘》(MEDLINE);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    disease trait; random forest methodology; accuracy of genomic prediction;

    机译:疾病特征随机森林方法论;基因组预测的准确性;
  • 入库时间 2022-08-17 23:23:22

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号