首页> 美国卫生研究院文献>BMC Genomics >Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks
【2h】

Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks

机译:假设在大型生物医学概念网络上作为监督链接发现和自动分类标记的生成

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper.
机译:近年来,从生物医学文献中得出假设的计算方法得到了深入研究。然而,从大型文献库中自动发现新颖的,跨孤岛的生物医学假设仍然是一个挑战。为了应对这一挑战,我们首先将生物医学文献库建模为生物医学概念的综合网络,并将假设生成公式化为概念网络上链接发现的过程。我们从生物医学文献语料库中提取相关信息,并使用Map-Reduce框架在群集上生成概念网络和概念作者地图。我们提取了一组异构特征,例如基于随机行走的特征,邻域特征和普通作者特征。在我们的概念网络中,可能考虑使用链接发现的潜在链接数量很大,并且为了解决可伸缩性问题,使用具有Map-Reduce框架的群集从概念网络中提取功能。我们进一步将链接发现建模为对训练数据集进行的分类问题,该训练数据集是从连续两个持续时间拍摄的两个网络快照中自动提取的。已经研究了一组异类特征,这些异类特征涵盖了从概念网络派生的拓扑和语义特征,它们对所提出的有监督的链路发现过程的准确性有影响。本文提出了一种基于所提出的方法的假设生成案例研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号