首页> 外文期刊>Frontiers in Genetics >Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium
【24h】

Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium

机译:基因组预测精度使用基于连杆不平衡的大小和分层聚类定义的单倍型

获取原文
获取外文期刊封面目录资料

摘要

Genomic prediction is an effective way to estimate the genomic breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). The used of haplotype, clusters of linked single nucleotide polymorphism (SNP) as markers instead of individual SNPs can improve the accuracy of genomic prediction. Since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with a cluster of markers is higher compared to an individual marker. To make haplotypes efficient in genomic prediction, finding optimal ways to define haplotypes is essential. In this study, 770K or 50K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 3,498 cattle. Using SNP chip data, haplotype was defined in three different ways based on 1) the number of SNPs included, 2) length of haplotypes (bp), and 3) agglomerative hierarchical clustering based on LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; 5, 10, 20 or 50 SNPs on average per haplotype. A linear mixed model using haplotype to calculated the covariance matrix was applied for testing the prediction accuracy of each haplotype size. Also, conventional SNP-based linear mixed model was tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight (CWT), eye muscle area (EMA) and backfat thickness (BFT) were used as the phenotypes. This study reveals that using haplotypes generally showed increased accuracy compared to conventional SNP-based model for CWT and EMA, but found to be small or no increase in accuracy for BFT. LD clustering-based haplotypes specifically the five SNPs size showed the highest prediction accuracy for CWT and EMA. Meanwhile, the highest accuracy was obtained when length-based haplotypes with five SNPs were used for BFT. The maximum gain in accuracy was 1.3% from cross-validation and 4.6% from forward validation for EMA, suggesting that genomic prediction accuracy can be increased by using haplotypes. However, the improvement from using haplotypes may depend on the trait of interest. In addition, when the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles, thereby reducing computational costs. Therefore, finding optimal ways to define haplotypes and using the haplotype alleles as markers can improve the accuracy of genomic prediction.
机译:基因组预测是基于统计方法估计来自遗传信息的基因组育种值的有效方法,例如最佳线性无偏的预测(Blup)。单倍型,链接单核苷酸多态性(SNP)的用途作为标记而不是单独的SNP可以提高基因组预测的准确性。由于与个体标记相比,由于具有标记簇的群体具有强烈连锁不平衡(LD)的定量性状点的概率更高。为了使单倍型高效在基因组预测中,找到定义单倍型的最佳方式是必不可少的。在本研究中,从Hanwoo(韩国牛)收集了770k或50k的SNP芯片数据组成,包括3,498个牛。使用SNP芯片数据,单倍型以三种不同的方式定义,基于1)包括基于LD的单倍型(BP)和3个)的单倍型(BP)的长度的数量。为了将方法平行进行比较,所有方法定义的单倍型都设定为具有可比尺寸;平均每单倍型平均5,10,20或50个SNP。使用单倍型计算用于计算协方差矩阵的线性混合模型用于测试每种单倍型大小的预测精度。此外,测试了常规的SNP的线性混合模型以评估单倍型集合对基因组预测的性能。使用胴体重量(CWT),眼肌区域(EMA)和背带厚度(BFT)作为表型。该研究表明,与CWT和EMA的传统基于SNP的模型相比,使用单倍型通常表现出更高的精度,但发现BFT的准确性小或没有增加。基于LD基于聚类的单倍型专门为CWT和EMA的预测精度显示出最高的预测精度。同时,当具有五个SNP的长度的单倍型以用于BFT时获得的最高精度。精度的最大增益从交叉验证的1.3%和4.6%的EMA向前验证,表明通过使用单倍型可以增加基因组预测准确性。然而,使用单倍型的改善可能取决于感兴趣的特征。另外,当比较每个单倍型定义方法产生的等位基因数量时,通过LD聚类产生最小数量的等位基因,从而降低了计算成本。因此,寻找定义单倍型和使用单倍型等位基因作为标记可以提高基因组预测的准确性的最佳方式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号