首页> 外文会议>International Conference on Intelligent Computing >Evaluation of Phenotype Classification Methods for Obesity Using Direct to Consumer Genetic Data
【24h】

Evaluation of Phenotype Classification Methods for Obesity Using Direct to Consumer Genetic Data

机译:用直接向消费​​者遗传数据评估肥胖表型分类方法的评价

获取原文

摘要

Direct-to-Consumer genetic testing services are becoming more ubiquitous. Consumers of such services are sharing their genetic and clinical information with the research community to facilitate the extraction of knowledge about different conditions. In this paper, we build on these services to analyse the genetic data of people with different BMI levels to determine the immediate and long-term risk factors associated with obesity. Using web scraping techniques, a dataset containing publicly available information about 230 participants from the Personal Genome Project is created. Subsequent analysis of the dataset is conducted for the identification of genetic variants associated with high BMI levels via standard quality control and association analysis protocols for Genome Wide Association Analysis. We applied a combination of Random Forest based feature selection algorithm and Support Vector Machine with Radial Basis Function Kernel learning method to the filtered dataset. Using a robust data science methodology our approach identified obesity related genetic variants, to be used as features when predicting individual obesity susceptibility. The results reveal that the subset of features obtained through the Random Forest based algorithm improve the performance of the classifier when compared to the top statistically significant genetic variants identified in logistic regression. Support Vector Machine showed the best results with sensitivity=81%, specificity=83% and area under the curve=92% when the model was trained with the top fifteen features selected by Boruta.
机译:直接消费者遗传检测服务变得更加普遍存在。这些服务的消费者正在与研究界共享其遗传和临床信息,以便于提取关于不同条件的知识。在本文中,我们建立了这些服务,分析了不同BMI水平的人的遗传数据,以确定与肥胖相关的立即和长期的风险因素。使用Web刮擦技术,创建了一个与个人基因组项目中有关230个参与者的公开信息的数据集是。通过标准质量控制和关联分析协议对基因组宽关联分析的标准质量控制和关联分析方案进行与高BMI水平相关的遗传变体进行后续分析。我们应用了随机林的特征选择算法的组合,支持向量机的径向基函数内核学习方法到过滤数据集。使用稳健的数据科学方法,我们的方法确定了肥胖相关的遗传变体,当预测个人肥胖易感性时被用作特征。结果表明,通过随机林的算法获得的特征子集提高了分类器的性能,与在逻辑回归中识别的顶部统计学上显着的遗传变体相比。支持向量机显示灵敏度的最佳效果= 81%,特异性= 83%,曲线下的面积= 92%= 92%,当博鲁塔选择的前十五个特征培训时培训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号