首页> 外文OA文献 >The use of principal component analysis for predicting genomic breeding values
【2h】

The use of principal component analysis for predicting genomic breeding values

机译:主成分分析法在预测基因组育种价值中的应用

摘要

During the last few years the idea of predicting quantitative traits and diseases based on genotypic information has raised a major interest in animal and plant breeding as well as in human genetics. However, there are still important questions and problems that need to be addressed. Some of these problems are statistical. Statistical problems mainly concern multicollinearity basic derived from the huge amount of available data. In addition, the number of variables that needs to be estimated (p) is much larger than the number of observations (n) disabling least squares methodology. Principal component analysis (PCA) is a multivariate statistical method often used to deal with these problems. The objective of this study was to investigate the use of PCA for predicting genomic breeding values. Data of 1,609 first lactation Holstein heifers were analysed including test-day milk, fat and protein yields. Animals originated from 4 countries, Ireland, United Kingdom, the Netherlands and Sweden and were genotyped within the RobustMilk project with the Illumina BovineSNP50 Beadchip. After editing, 37,069 SNPs remained.Two different models were compared for genomic predictions i) Principal component regression (PCR) was used to directly estimate genomic breeding values. Selection of principal components (PCs) was based either on their eigenvalues or the regression sum of square (SS) contribution, ii) a best linear unbiased prediction model with genomic relationship matrix (GBLUP) was developed to compare accuracies to those obtained by PCR models. In a third case, PCs extracted from the G-matrix were added in the GBLUP model as fixed effects to investigate the impact of population structure when predicting genomic breeding values. The dataset was split in four training (reference populations) and testing parts for validation. Each testing subset included all animals from only one country. Predictive ability was calculated as Pearson correlation between the predicted genomic values and the phenotypes.PCR where PCs selection was based on their eigenvalues resulted in considerably high accuracies and outperformed both PCR (SS) and GBLUP models. Accuracies varied between populations and traits. Interestingly, highest accuracies were obtained for the only genetically distinguished population (GBR), according to PCA, in the dataset with only the first or the first two PCs for protein and milk yield, respectively. In GBLUP models an increase of the accuracies (~40% on average) was observed in all cases when PCs were added in the model. Simplicity of PCR method, fast computation, reduction of data dimension (96%) as well as the ability of both predicting breeding values and identifying groups in the data are the main benefits of PCR. The above elements together with at least as accurate predictions as GBLUP, obtained with real data, marks PCR as an attractive tool for animal breeding. However, the variation on the number of PCs needed to achieve highest accuracies could be a drawback of the method. According to our results, where the highest accuracies obtained for the only group of animals genetically separated from the rest, we hypothesize that PCR could be tested for across breed genomic predictions.
机译:在过去的几年中,基于基因型信息预测数量性状和疾病的想法引起了动植物育种以及人类遗传学的极大兴趣。但是,仍然存在需要解决的重要问题。其中一些问题是统计上的。统计问题主要涉及从大量可用数据中得出的多重共线性基础。此外,需要估计的变量数量(p)远大于禁用最小二乘法的观察值(n)。主成分分析(PCA)是一种多变量统计方法,通常用于处理这些问题。这项研究的目的是调查PCA在预测基因组育种价值中的用途。分析了1,609头首次哺乳的荷斯坦小母牛的数据,包括试验日的牛奶,脂肪和蛋白质的产量。动物来自爱尔兰,英国,荷兰和瑞典4个国家,并通过Illumina BovineSNP50 Beadchip在RobustMilk项目中进行了基因分型。编辑后,剩下37,069个SNP。比较了两个不同的模型进行的基因组预测:i)使用主成分回归(PCR)直接估计基因组育种值。主成分(PC)的选择基于其特征值或平方和(SS)贡献的回归和,ii)建立了具有基因组关系矩阵(GBLUP)的最佳线性无偏预测模型以将准确性与PCR模型获得的准确性进行比较。在第三种情况下,将从G矩阵提取的PC作为固定效应添加到GBLUP模型中,以研究预测基因组育种值时种群结构的影响。将数据集分为四个训练(参考总体)和测试部分以进行验证。每个测试子集仅包括来自一个国家的所有动物。通过预测基因组值和表型之间的皮尔逊相关性来计算预测能力。基于PC的特征值进行选择的PCR产生了很高的准确性,并且优于PCR(SS)和GBLUP模型。准确度因群体和特征而异。有趣的是,根据PCA,在仅有蛋白质或牛奶产量的前两个或前两个PC的数据集中,唯一的遗传上可分辨的群体(GBR)获得了最高的准确度。在GBLUP模型中,在模型中添加PC的所有情况下,都观察到了精度的提高(平均约40%)。 PCR方法的简单性,快速的计算,减少的数据尺寸(> 96%)以及预测育种值和识别数据中的组的能力是PCR的主要优点。以上要素以及至少与真实数据一样的GBLUP准确预测,标志着PCR是一种用于动物育种的有吸引力的工具。但是,实现最高精确度所需的PC数量的变化可能是该方法的缺点。根据我们的结果,在仅有的一组遗传上与其余动物分开的动物中,获得最高的准确性,我们假设可以对整个品种的基因组预测进行PCR检测。

著录项

  • 作者

    Dadoudis Christos;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号