首页> 外文会议>IEEE International Conference on Computational Advances in Bio and Medical Sciences >Performance evaluation of different encoding strategies for quantitative genetic trait prediction
【24h】

Performance evaluation of different encoding strategies for quantitative genetic trait prediction

机译:用于量化遗传特征预测的不同编码策略的性能评估

获取原文

摘要

Given the genotype values of a set of biallelic molecular markers, such as Single Nucleotide Polymorphisms (SNPs), on a collection of plant, animal or human samples, quantitative genetic traits, such as weight, height, fruit size etc. of these samples can be predicted effectively. Quantitative genetic traits prediction has received great attention given that it helps breeding companies to develop more effective breeding strategies. Although lots of work have been proposed for the prediction task, relatively less attention has been paid on the effects of encodings of the genotypes values. Quantitative genetic trait prediction is usually presented as a linear regression model. In the regression model, genotypes need to be encoded numerically according to their types: one heterozygous type and two homozygous types. A traditional encoding encodes the two homozygous types as 0 and 2 respectively and the heterozygous type as 1. In this work, we evaluated five existing genetic encoding models as well as two recently proposed encoding methods which consider the genetic trait prediction problem as a multiple regression on categorical data problem. We also discussed the scenario of epistasis, where multiple markers could interact with each other. We evaluated the performance of five statistically-intuitive encoding strategies and eight biologically-oriented encoding strategies as well as the extensions of the previously mentioned two encoding methods. We showed that overall the two recent encoding methods achieve better prediction accuracy for both single marker scenario and epistasis scenario.
机译:给定一组双等位基因分子标记的基因型值,例如植物,动物或人类样品集合上的单核苷酸多态性(SNP),这些样品的定量遗传特征,例如重量,高度,果实大小等,可以被有效地预测。定量遗传特征预测已经得到了广泛的关注,因为它有助于育种公司制定更有效的育种策略。尽管已经为预测任务提出了很多工作,但是对基因型值的编码效果的关注相对较少。定量遗传特征预测通常以线性回归模型的形式呈现。在回归模型中,基因型需要根据其类型进行数字编码:一种杂合型和两种纯合型。传统编码将两种纯合类型分别编码为0和2,将杂合类型编码为1。在这项工作中,我们评估了五个现有的遗传编码模型以及将遗传特征预测问题视为多元回归的两个最近提出的编码方法。关于分类数据问题。我们还讨论了上标的情况,其中多个标记可能会相互影响。我们评估了五种统计学上直观的编码策略和八种面向生物学的编码策略的性能,以及前面提到的两种编码方法的扩展。我们表明,总体而言,最近的两种编码方法对于单个标记场景和上位场景均实现了更好的预测准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号