首页> 外文OA文献 >Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology
【2h】

Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology

机译:使用机器学习和基因功能相似性鉴定疾病基因,通过基因本体进行评估

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Identifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data. In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD). We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction.
机译:从大量遗传数据中鉴定疾病基因是后基因组时代最具挑战性的任务之一。此外,复杂的疾病存在高度异质的基因型,其难以造成的生物学标记鉴定。机器学习方法广泛用于识别这些标记,但它们的性能高度依赖于可用数据的大小和质量。在这项研究中,我们证明了使用基因本体(GO)的基因功能相似性培训的机器学习分类器可以改善复杂疾病所涉及的基因的鉴定。为此,我们开发了一种监督机器学习方法,以预测复杂的疾病基因。使用自闭症谱系疾病(ASD)候选基因评估所提出的管道。通过采用不同的语义相似度测量获得基因官能相似性的定量测量。为了推断ASD基因之间的隐藏功能相似之处,建立了ASD和非ASD基因的定量语义相似矩阵上的各种类型的机器学习分类器。在ASD和非ASD基因功能相似性上培训和测试的分类器优先于先前报告的ASD分类器。例如,随机森林(RF)分类器实现了0.80的AUC,用于预测新的ASD基因,其高于报告的分类器(0.73)。另外,该分类剂能够预测富含核心ASD表型的73个新型ASD候选基因,例如自闭症和强迫性行为。此外,还富集了预测基因,用于ASD共同发生条件,包括注意力缺陷多动障碍(ADHD)。我们还开发了一个具有所提出的方法的KNIME工作流,允许用户在不需要机器学习和编程技能的情况下配置和执行它。机器学习是一种通过识别新型疾病基因来破译ASD机制的有效且可靠的技术,但本研究进一步证明了通过掺入基因功能相似性的定量测量来改善它们的性能。源代码和所提出的方法的工作流程可在https://github.com/muh-asif/asd -genes-prediction上获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号