首页> 外文期刊>Journal of Theoretical Biology >Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation
【24h】

Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation

机译:基因型归纳母体后代三种特拉中不同机器学习方法的评估与比较

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. (C) 2016 Elsevier Ltd. All rights reserved.
机译:基因型归毒是一种预测无关个体和父母后代Trios的未知基因型的重要工具。有几种估算方法可用,可以使用通用机器学习方法,或者部署专用于推断出缺失基因型的算法。在这项研究中,八种机器学习方法的性能:支持向量机,K-CORMONT邻居,极端学习机,径向基函数,随机森林,Adaboost,Logitboost和全腾料在归属精度,计算时间和因素方面相比。影响估算准确性。使用真实和模拟数据集采用的方法,以在父片后代三旋转中施加未键入的SNP。测试方法表明,父片后代TRIOS的归责可以准确。随机森林和支持向量机比其他机器学习方法更准确。总吞吐量比其他方法略差差。方法之间的运行时间是不同的。 ELM总是最快的算法。在提高样本量的情况下,RBF需要长归属时间。该研究中的测试方法可以是未键入的SNP归咎于低缺失数据率的替代。但是,建议其他机器学习方法用于归属。 (c)2016 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号