首页> 美国卫生研究院文献>Evolutionary Bioinformatics Online >Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean
【2h】

Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean

机译:基于质量控制协变量的大豆基因组预测精度值的响应面分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

An important and broadly used tool for selection purposes and to increase yield and genetic gain in plant breeding programs is genomic prediction (GP). Genomic prediction is a technique where molecular marker information and phenotypic data are used to predict the phenotype (eg, yield) of individuals for which only marker data are available. Higher prediction accuracy can be achieved not only by using efficient models but also by using quality molecular marker and phenotypic data. The steps of a typical quality control (QC) of marker data include the elimination of markers with certain level of minor allele frequency (MAF) and missing marker values and the imputation of missing marker values. In this article, we evaluated how the prediction accuracy is influenced by the combination of 12 MAF values, 27 different percentages of missing marker values, and 2 imputation techniques (IT; naïve and Random Forest (RF)). We constructed a response surface of prediction accuracy values for the two ITs as a function of MAF and percentage of missing marker values using soybean data from the University of Nebraska–Lincoln Soybean Breeding Program. We found that both the genetic architecture of the trait and the IT affect the prediction accuracy implying that we have to be careful how we perform QC on the marker data. For the corresponding combinations MAF-percentage of missing values we observed that implementing the RF imputation increased the number of markers by 2 to 5 times than the simple naïve imputation method that is based on the mean allele dosage of the non-missing values at each loci. We conclude that there is not a unique strategy (combination of the QCs and imputation method) that outperforms the results of the others for all traits.
机译:基因选择(GP)是一种重要的且用途广泛的工具,可用于选择目的并提高植物育种程序的产量和遗传增益。基因组预测是一种技术,其中分子标记信息和表型数据用于预测仅可使用标记数据的个体的表型(例如产量)。不仅可以通过使用有效的模型,而且可以通过使用优质的分子标记和表型数据来获得更高的预测准确性。标记数据的典型质量控制(QC)的步骤包括消除具有一定水平的次要等位基因频率(MAF)和缺失标记值的标记,以及推断缺失标记值。在本文中,我们评估了12种MAF值,27种不同百分比的缺失标记值和2种归类技术(IT;朴素和随机森林(RF))的组合对预测准确性的影响。我们使用来自内布拉斯加州大学林肯分校大豆育种计划的大豆数据,构建了两个IT的预测准确度值的响应面,作为MAF和缺失标记值百分比的函数。我们发现,性状的遗传结构和IT都影响预测准确性,这意味着我们必须谨慎对待标记数据进行质量控制。对于相应的组合,缺失值的MAF-百分比,我们观察到,实施RF插补比基于每个位点无缺失值的平均等位基因剂量的简单纯朴插补方法将标记数增加了2到5倍。我们得出结论,对于所有特征,没有一种独特的策略(QC和推算方法的组合)优于其他方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号