...
首页> 外文期刊>Frontiers in Plant Science >Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
【24h】

Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species

机译:探索多倍体交叉种类复杂性状基因组预测的深度学习

获取原文
           

摘要

Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/ .
机译:基因组预测(GP)是使用基因组宽标记信息预测未测试候选者的遗传优点的方法。虽然植物和动物中存在GP的许多实例,但对于多倍体生物的应用仍然稀缺,部分原因是基因组资源有限和该系统的复杂性。深度学习(DL)技术包括一个异构的机器学习算法集合,其在许多预测任务中卓越。 DL对GP对标准线性模型方法的潜在优势是DL可能考虑所有遗传相互作用,包括主导和外观,预期在大多数多倍体中具有特殊相关性。在这项研究中,我们在两个重要的小水果或浆果中评估了线性和DL技术的预测准确性:草莓和蓝莓。两种数据集含有总共1,358个异聚倍草莓(2n = 8x = 112)和1,802个自动倍增倍细蓝莓(2n = 4x = 48)个体,分别为9,908和73,045个单核苷酸多态性(SNP)标记,并为五个农艺列术进行了表型每个特征。 DL取决于影响性能和优化超参数值的许多参数可以是一个关键步骤。在这里,我们表明应该预期超参数组合之间的相互作用,并且第一层的卷积滤波器和正则化的数量可以对模型性能产生重要影响。就基因组预测而言,除了即将到速的组件很重要时,我们没有发现DL的DL的优势。线性贝叶斯型号比卷积神经网络更好的全部添加剂架构,而在强大的外观下观察到相反。然而,通过使用能够考虑这些非线性效果的参数化,贝叶斯线性模型可以匹配或超过DL的预测精度。在HTTPS://github.com/lauzingaretti/deepgp/上提供DL管道的半自动实施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号