...
首页> 外文期刊>Research Opinions in Animal & Veterinary Sciences >Analyses and comparison of K-nearest neighbour and AdaBoost algorithms for genotype imputation
【24h】

Analyses and comparison of K-nearest neighbour and AdaBoost algorithms for genotype imputation

机译:基因型归因的K近邻算法和AdaBoost算法的分析与比较

获取原文
           

摘要

Genomic selection has become a standard tool in dairy cattle breeding. However, for other animal species, implementation of this technology is hindered by the high cost of genotyping. Genotypic imputation is defined as the prediction of genotypes for both unrelated individuals and parent-offspring trios at the single nucleotide polymorphism (SNP) locations in a sample of individuals for which assays are not directly available. Several imputation methods are available for imputation designed for livestock population. Machine learning methods have been used in genetic studies to build models capable of predicting missing values of a marker. In this study, strategies and factors affecting the imputation accuracy of parent-offspring trios were compared using two Machine Learning methods namely K-Nearest neighbour (KNN) and AdaBoost (AB). The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Two datasets of D1 (100 trios with 5k SNPs) and D2 (500 trios with 5k SNPs) were simulated. The methods were compared in terms of imputation accuracy and computation time and factors affecting imputation accuracy (sample size). Comparison of two methods for imputation showed that the KNN outperformed AB for imputation accuracy. The time of computation was different between meth ods. The KNN was the fastest algorithm. Accuracy of imputation increased with increasing number of trios. Simulation datasets showed that our methods performed very well for imputation of un-typed SNPs and can be used as an alternative for imputation of parent-offspring trios than other methods.
机译:基因组选择已成为奶牛育种的标准工具。但是,对于其他动物物种,基因分型的高昂成本阻碍了该技术的实施。基因型估算被定义为在不直接进行检测的个体样品中,单亲多态性(SNP)位置上无关个体和亲代三重基因型的基因型预测。针对牲畜种群设计的插补方法有几种。遗传算法已将机器学习方法用于构建能够预测标记缺失值的模型。在这项研究中,使用K-最近邻居(KNN)和AdaBoost(AB)这两种机器学习方法比较了影响亲子后代三位一体的插补准确性的策略和因素。该方法使用模拟数据将亲本后代三重奏中的未分型SNP推算出来。模拟了D1(带有5k SNP的100个三重奏)和D2(带有5k SNP的500个三重奏)的两个数据集。根据插补精度和计算时间以及影响插补精度(样本大小)的因素对这些方法进行了比较。两种插补方法的比较表明,在插补精度方面,KNN优于AB。两种方法的计算时间不同。 KNN是最快的算法。归因的准确性随着三重奏数量的增加而增加。模拟数据集显示,我们的方法在非类型SNP的插补方面表现非常出色,并且可以比其他方法作为插补亲子后代三重奏的替代方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号