首页> 美国卫生研究院文献>PLoS Computational Biology >Benchmarking network propagation methods for disease gene identification
【2h】

Benchmarking network propagation methods for disease gene identification

机译:用于疾病基因鉴定的基准网络传播方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes.
机译:在计算机上识别疾病的潜在靶标基因是药物靶标发现的重要方面。最近的研究表明,可以通过利用遗传,基因组和蛋白质相互作用信息找到成功的靶标。在这里,我们根据来自OpenTargets的22种常见非癌性疾病的基因疾病数据,系统地测试了基于网络传播的12种不同算法的能力,以识别任何药物靶向的基因。我们考虑了两个生物网络,六个性能指标,并比较了两种输入基因-疾病关联评分。设计因素对性能的影响通过附加解释模型进行了量化。由于蛋白质复合物的存在,标准的交叉验证导致过分乐观的性能估计。为了获得现实的估计,我们引入了两种新颖的可识别蛋白质复合物的交叉验证方案。当使用已知的药物靶点播种生物网络时,基于机器学习和扩散的方法在前20条建议中发现了2-4个真实靶点。通过遗传学为网络植入与疾病相关的基因后,其性能平均下降不到1个真实值。使用较大的网络(尽管噪声较大)可以改善总体性能。我们得出的结论是,基于扩散的优先级和机器学习应用于基于扩散的功能非常适合在实践中发现药物,并且比简单的邻居投票方法有所改进。我们还演示了选择适当的验证策略和定义种子病害基因的巨大影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号