首页> 外文会议>Asia-Pacific Bioinformatics Conference >Data-driven encoding for quantitative genetic trait prediction
【24h】

Data-driven encoding for quantitative genetic trait prediction

机译:用于定量遗传性状预测的数据驱动编码

获取原文

摘要

Motivation: Given a set of biallelic molecular markers, such as SNPs, with genotype values on a collection of plant, animal or human samples, the goal of quantitative genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Quantitative genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes: the three distinct genotype values, corresponding to one heterozygous and twohomozygous alleles, are usually coded as integers, and manipulated algebraically in the model. Further, epistasis between multiple markers is modeled as multiplication between the markers: it is unclear that the regression model continues to be effective under this. In this work we investigate the effects of encodings to the quantitative genetic trait prediction problem.Results: We first showed that different encodings lead to different prediction accuracies, in many test cases. We then proposed a data-driven encoding strategy, where we encode the genotypes according to their distribution in the phenotypes and we alloweach marker to have different encodings. We show in our experiments that this encoding strategy is able to improve the performance of the genetic trait prediction method and it is more helpful for the oligogenic traits, whose values rely on a relativelysmall set of markers. To the best of our knowledge, this is the first paper that discusses the effects of encodings to the genetic trait prediction problem.
机译:动机:给定一组双曲线分子标记,例如SNP,在植物,动物或人类样品的集合上具有基因型值,定量遗传性状预测的目的是通过同时建模所有标记效应来预测定量特征价值。定量遗传性状预测通常表示为基因型的定量编码的线性回归模型:对应于一种杂合和双基奇的等位基因的三种不同的基因型值通常被编码为整数,并在模型中操纵代数。此外,在标记之间的乘法中建模多标记之间的超声类:尚不清楚回归模型继续有效。在这项工作中,我们调查编码对定量遗传性状预测问题的影响。结果:我们首先显示出不同的编码导致不同的预测精度,在许多测试用例中。然后,我们提出了一种数据驱动的编码策略,在那里我们根据其在表型中的分布编码基因型,我们露出标记物具有不同的编码。我们在我们的实验中展示了这种编码策略能够改善遗传性状预测方法的性能,并且对寡求性特征更有助于其值依赖于相对较义的标记。据我们所知,这是第一种讨论编码对遗传性状预测问题的纸张。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号