首页> 外文会议>International Joint Conference on Neural Networks >Machine learning approaches for the prediction of obesity using publicly available genetic profiles
【24h】

Machine learning approaches for the prediction of obesity using publicly available genetic profiles

机译:使用公开可获得的遗传特征预测肥胖的机器学习方法

获取原文

摘要

This paper presents a novel approach based on the analysis of genetic variants from publicly available genetic profiles and the manually curated database, the National Human Genome Research Institute Catalog. Using data science techniques, genetic variants are identified in the collected participant profiles and then indexed as risk variants in the National Human Genome Research Institute Catalog. Indexed genetic variants or Single Nucleotide Polymorphisms are used as inputs in various machine learning algorithms for the prediction of obesity. Body mass index status of participants is divided into two classes, Normal Class and Risk Class. Dimensionality reduction tasks are performed to generate a set of principal variables - 13 SNPs - for the application of various machine learning methods. The models are evaluated using receiver operator characteristic curves and the area under the curve. Machine learning techniques including gradient boosting, generalized linear model, classification and regression trees, k-nearest neighbours, support vector machines, random forest and multilayer perceptron neural network are comparatively assessed in terms of their ability to identify the most important factors among the initial 6622 variables describing genetic variants, age and gender, to classify a subject into one of the body mass index related classes defined in this study. Our simulation results indicated that support vector machine generated the highest area under the curve value of 90.5%.
机译:本文提出了一种新方法,该方法基于对可公开获得的遗传图谱和手动管理的数据库(美国国家人类基因组研究所目录)中遗传变异的分析。使用数据科学技术,在收集的参与者资料中识别出遗传变异,然后在美国国家人类基因组研究所目录中将其索引为风险变异。索引的遗传变异或单核苷酸多态性被用作各种机器学习算法中的输入,以预测肥胖。参与者的体重指数状态分为两个类别,正常类别和风险类别。执行降维任务以生成一组主变量-13个SNP-用于各种机器学习方法的应用。使用接收器操作员特征曲线和曲线下的面积评估模型。机器学习技术,包括梯度提升,广义线性模型,分类和回归树,k近邻,支持向量机,随机森林和多层感知器神经网络,在确定最初的6622中最重要因素的能力方面进行了比较评估。描述遗传变异,年龄和性别的变量,以将受试者分类为本研究中定义的与体重指数相关的类别之一。我们的仿真结果表明,支持向量机在90.5 \%的曲线值下产生了最大的面积。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号