...
首页> 外文期刊>BMC Bioinformatics >Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches
【24h】

Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches

机译:神经网络用于现实生物医学图中的链接预测:基于图嵌入的方法的多维评估

获取原文
           

摘要

Link prediction in biomedical graphs has several important applications including predicting Drug-Target Interactions (DTI), Protein-Protein Interaction (PPI) prediction and Literature-Based Discovery (LBD). It can be done using a classifier to output the probability of link formation between nodes. Recently several works have used neural networks to create node representations which allow rich inputs to neural classifiers. Preliminary works were done on this and report promising results. However they did not use realistic settings like time-slicing, evaluate performances with comprehensive metrics or explain when or why neural network methods outperform. We investigated how inputs from four node representation algorithms affect performance of a neural link predictor on random- and time-sliced biomedical graphs of real-world sizes (~?6 million edges) containing information relevant to DTI, PPI and LBD. We compared the performance of the neural link predictor to those of established baselines and report performance across five metrics. In random- and time-sliced experiments when the neural network methods were able to learn good node representations and there was a negligible amount of disconnected nodes, those approaches outperformed the baselines. In the smallest graph (~?15,000 edges) and in larger graphs with approximately 14% disconnected nodes, baselines such as Common Neighbours proved a justifiable choice for link prediction. At low recall levels (~?0.3) the approaches were mostly equal, but at higher recall levels across all nodes and average performance at individual nodes, neural network approaches were superior. Analysis showed that neural network methods performed well on links between nodes with no previous common neighbours; potentially the most interesting links. Additionally, while neural network methods benefit from large amounts of data, they require considerable amounts of computational resources to utilise them. Our results indicate that when there is enough data for the neural network methods to use and there are a negligible amount of disconnected nodes, those approaches outperform the baselines. At low recall levels the approaches are mostly equal but at higher recall levels and average performance at individual nodes, neural network approaches are superior. Performance at nodes without common neighbours which indicate more unexpected and perhaps more useful links account for this.
机译:生物医学图中的链接预测具有几个重要的应用,包括预测药物-靶标相互作用(DTI),蛋白质-蛋白质相互作用(PPI)预测和基于文献的发现(LBD)。可以使用分类器输出节点之间链接形成的可能性。最近,有几篇著作使用神经网络来创建节点表示,从而允许向神经分类器提供丰富的输入。对此已进行了初步工作,并报告了可喜的结果。但是,他们没有使用诸如时间切片之类的现实设置,没有使用全面的指标评估性能,也没有解释神经网络方法何时或为何表现出色。我们研究了来自四个节点表示算法的输入如何影响神经链接预测器在真实世界大小(〜600万条边)的随机和时间切片的生物医学图上的性能,该图包含与DTI,PPI和LBD相关的信息。我们将神经链接预测器的性能与已建立的基线的性能进行了比较,并报告了五个指标的性能。在随机和时间切片的实验中,当神经网络方法能够学习良好的节点表示并且断开节点的数量可以忽略不计时,这些方法的性能优于基线。在最小的图(〜15,000个边)中以及在具有约14%断开节点的较大图中,基线(例如,公共邻居)被证明是进行链路预测的合理选择。在较低的召回水平(〜?0.3)下,这些方法基本相同,但是在所有节点上的较高召回水平和单个节点的平均性能下,神经网络方法更为出色。分析表明,神经网络方法在没有先前公共邻居的节点之间的链接上表现良好;可能是最有趣的链接。此外,尽管神经网络方法受益于大量数据,但它们需要大量的计算资源才能利用它们。我们的结果表明,当有足够的数据可供神经网络方法使用并且断开节点的数量可忽略不计时,这些方法的性能将超过基线。在较低的召回水平下,这些方法几乎是相同的,但是在较高的召回水平和各个节点的平均性能下,神经网络方法则更为出色。在没有公共邻居的节点上的性能说明了更多意外的情况,也许是更有用的链接。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号