...
首页> 外文期刊>Artificial intelligence in medicine >Terminological resources for text mining over biomedical scientific literature
【24h】

Terminological resources for text mining over biomedical scientific literature

机译:生物医学科学文献文本挖掘的术语资源

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Objective: We present a combined terminological resource for text mining over biomedical literature. The purpose of the resource is to allow the detection of mentions of specific biological entities in scientific publications, and their grounding to widely accepted identifiers. This is an essential process, useful in itself, and necessary as an intermediate step for almost every type of complex text mining application. Methods: We discuss some of the properties of the terminology for this domain, in particular the degree of ambiguity, which constitutes a peculiar problem for text mining applications. Without a correct recognition and disambiguation of the domain entities no reliable results can be produced.Results: We also discuss an application that makes use of the resulting terminological knowledge base. We annotate an existing corpus of sentences about protein interactions. The annotation consists of a normalization step that matches the terms in our resource with their actual representation in the corpus, and a disambiguation step that resolves the ambiguity of matched terms.Conclusion: In this paper we present a large terminological resource, compiled through the aggregation of a number of different manually curated sources. We discuss the lexical properties of such resources, specifically the degree of ambiguity of the terms, and we inspect the causes of such ambiguity, in particular for protein names. This information is of vital importance for the implementation of an efficient term normalization and grounding algorithm.
机译:目的:我们提供了一种结合的术语资源,用于通过生物医学文献进行文本挖掘。该资源的目的是允许检测科学出版物中特定生物实体的提及,以及它们被广泛接受的标识符的基础。这是一个必不可少的过程,它本身很有用,并且对于几乎每种类型的复杂文本挖掘应用程序来说都是必要的中间步骤。方法:我们讨论了该领域术语的某些属性,尤其是歧义程度,这构成了文本挖掘应用程序的一个特殊问题。没有正确识别和消除领域实体的歧义,就不会产生可靠的结果。结果:我们还将讨论利用所产生的术语知识库的应用程序。我们注释有关蛋白质相互作用的现有句子全集。注释包括一个标准化步骤,该步骤将我们的资源中的术语与它们在语料库中的实际表示相匹配,而消歧步骤则解决了匹配术语的歧义。结论:在本文中,我们提出了一个庞大的术语资源,通过汇总许多不同的手动策划资源。我们讨论了此类资源的词汇特性,特别是术语的歧义程度,并检查了这种歧义的原因,尤其是蛋白质名称。该信息对于有效术语归一化和接地算法的实现至关重要。

著录项

  • 来源
    《Artificial intelligence in medicine》 |2011年第2期|p.107-114|共8页
  • 作者单位

    Institute of Computational Linguistics, University of Zurich. Binzmilhlestrasse 14, CH-8050 Zurich, Switzerland;

    Institute of Computational Linguistics, University of Zurich. Binzmilhlestrasse 14, CH-8050 Zurich, Switzerland;

    Department of Computer and Information Science (1DI), Norwegian University of Science and Technology (NTNU), Sem Sxlands vei 7-9, NO-7491 Trondheim, Norway Tsujii Laboratory, Department of Computer Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo 113-0033, Japan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Information extraction; Text mining; Terminological resources;

    机译:信息提取;文本挖掘;术语资源;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号