...
首页> 外文期刊>BMC Bioinformatics >KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences
【24h】

KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences

机译:KnowLife:一种用于构建生物医学大知识图的通用方法

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Biomedical knowledge bases (KB’s) have become important assets in life sciences. Prior work on KB construction has three major limitations. First, most biomedical KBs are manually built and curated, and cannot keep up with the rate at which new findings are published. Second, for automatic information extraction (IE), the text genre of choice has been scientific publications, neglecting sources like health portals and online communities. Third, most prior work on IE has focused on the molecular level or chemogenomics only, like protein-protein interactions or gene-drug relationships, or solely address highly specific topics such as drug effects. We address these three limitations by a versatile and scalable approach to automatic KB construction. Using a small number of seed facts for distant supervision of pattern-based extraction, we harvest a huge number of facts in an automated manner without requiring any explicit training. We extend previous techniques for pattern-based IE with confidence statistics, and we combine this recall-oriented stage with logical reasoning for consistency constraint checking to achieve high precision. To our knowledge, this is the first method that uses consistency checking for biomedical relations. Our approach can be easily extended to incorporate additional relations and constraints. We ran extensive experiments not only for scientific publications, but also for encyclopedic health portals and online communities, creating different KB’s based on different configurations. We assess the size and quality of each KB, in terms of number of facts and precision. The best configured KB, KnowLife, contains more than 500,000 facts at a precision of 93% for 13 relations covering genes, organs, diseases, symptoms, treatments, as well as environmental and lifestyle risk factors. KnowLife is a large knowledge base for health and life sciences, automatically constructed from different Web sources. As a unique feature, KnowLife is harvested from different text genres such as scientific publications, health portals, and online communities. Thus, it has the potential to serve as one-stop portal for a wide range of relations and use cases. To showcase the breadth and usefulness, we make the KnowLife KB accessible through the health portal ( http://knowlife.mpi-inf.mpg.de ).
机译:生物医学知识库(KB’s)已成为生命科学中的重要资产。先前关于KB构建的工作具有三个主要限制。首先,大多数生物医学知识库都是手动构建和管理的,无法跟上新发现的发布速度。第二,对于自动信息提取(IE),首选的文本类型是科学出版物,而忽略了诸如健康门户网站和在线社区之类的资源。第三,关于IE的大多数先前工作仅集中在分子水平或化学基因组学上,例如蛋白质-蛋白质相互作用或基因-药物关系,或仅针对高度特定的主题,例如药物作用。我们通过通用且可扩展的方法来自动构建KB,从而解决了这三个限制。使用少量种子事实对基于模式的提取进行远程监控,我们无需任何明确的培训就可以自动方式收集大量事实。我们使用置信度统计信息扩展了基于模式的IE的先前技术,并且将面向召回的阶段与逻辑推理相结合以进行一致性约束检查,以实现高精度。据我们所知,这是将一致性检查用于生物医学关系的第一种方法。我们的方法可以轻松扩展以包含其他关系和约束。我们不仅针对科学出版物,还针对百科全书健康门户网站和在线社区进行了广泛的实验,并根据不同的配置创建了不同的知识库。我们根据事实的数量和精度评估每个KB的大小和质量。配置最完善的知识库(KnowLife),包含13万种涉及基因,器官,疾病,症状,治疗以及环境和生活方式风险因素的关系,包含超过500,000个事实,精度为93%。 KnowLife是用于健康和生命科学的大型知识库,可以从不同的Web来源自动构建。作为一项独特功能,KnowLife收集自不同文本类型,例如科学出版物,健康门户网站和在线社区。因此,它有潜力作为一站式门户,用于各种关系和用例。为了展示其广度和实用性,我们通过健康门户(http://knowlife.mpi-inf.mpg.de)使KnowLife KB可以访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号