首页> 外文期刊>BMC Bioinformatics >Semantically linking molecular entities in literature through entity relationships
【24h】

Semantically linking molecular entities in literature through entity relationships

机译:通过实体关系语义联系文学中的分子实体

获取原文
           

摘要

BackgroundText mining tools have gained popularity to process the vast amount of available research articles in the biomedical literature. It is crucial that such tools extract information with a sufficient level of detail to be applicable in real life scenarios. Studies of mining non-causal molecular relations attribute to this goal by formally identifying the relations between genes, promoters, complexes and various other molecular entities found in text. More importantly, these studies help to enhance integration of text mining results with database facts.ResultsWe describe, compare and evaluate two frameworks developed for the prediction of non-causal or 'entity' relations (REL) between gene symbols and domain terms. For the corresponding REL challenge of the BioNLP Shared Task of 2011, these systems ranked first (57.7% F-score) and second (41.6% F-score). In this paper, we investigate the performance discrepancy of 16 percentage points by benchmarking on a related and more extensive dataset, analysing the contribution of both the term detection and relation extraction modules. We further construct a hybrid system combining the two frameworks and experiment with intersection and union combinations, achieving respectively high-precision and high-recall results. Finally, we highlight extremely high-performance results (F-score > 90%) obtained for the specific subclass of embedded entity relations that are essential for integrating text mining predictions with database facts.ConclusionsThe results from this study will enable us in the near future to annotate semantic relations between molecular entities in the entire scientific literature available through PubMed. The recent release of the EVEX dataset, containing biomolecular event predictions for millions of PubMed articles, is an interesting and exciting opportunity to overlay these entity relations with event predictions on a literature-wide scale.
机译:BackgroundText挖掘工具已变得越来越流行,可以处理生物医学文献中的大量可用研究文章。至关重要的是,此类工具必须提取足够详细的信息以适用于现实生活中的情况。通过正式确定基因,启动子,复合物和文本中发现的各种其他分子实体之间的关系,挖掘非因果分子关系的研究归因于此目标。更重要的是,这些研究有助于增强文本挖掘结果与数据库事实的集成。结果我们描述,比较和评估了两个用于预测基因符号和域术语之间的非因果关系或“实体”关系(REL)的框架。对于2011年BioNLP共享任务的REL挑战,这些系统排名第一(F分数为57.7%),第二(F分数为41.6%)。在本文中,我们通过在相关且更广泛的数据集上进行基准测试,研究了16个百分点的性能差异,分析了术语检测和关系提取模块的作用。我们进一步构建了一个将两个框架结合在一起的混合系统,并通过相交和并集组合进行实验,分别实现了高精度和高召回率。最后,我们重点介绍了针对嵌入式实体关系的特定子类获得的极高性能的结果(F分数> 90%),这对于将文本挖掘预测与数据库事实进行集成至关重要。结论本研究的结果将使我们在不久的将来成为可能通过PubMed注释整个科学文献中分子实体之间的语义关系。 EVEX数据集的最新发行版包含数百万篇PubMed文章的生物分子事件预测,这是一个有趣而令人兴奋的机会,可以将这些实体关系与整个文献范围的事件预测相叠加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号