首页> 外文期刊>BMC Bioinformatics >Enriching a biomedical event corpus with meta-knowledge annotation
【24h】

Enriching a biomedical event corpus with meta-knowledge annotation

机译:通过元知识注释丰富生物医学事件语料库

获取原文
           

摘要

Background Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event. Results We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa. Conclusion By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event.
机译:背景技术生物医学论文包含有关生物学相关实体,事实和事件的丰富信息。为了自动发现这些内容,我们使用文本挖掘技术,该技术依赖于带注释的语料库进行训练。为了提取蛋白质-蛋白质相互作用,基因型-表型/基因-疾病关联等,我们依靠事件语料库,该语料库以重要事实和发现中包含的分类,结构化表示形式进行注释。这些为培训特定领域信息提取(IE)系统提供了重要资源,以促进基于语义的文档搜索。没有其他信息,就不可能正确解释这些事件,例如,事件是否描述了事实,假设,实验结果或结果分析?作者对其分析的有效性有多自信?这些和其他类型的信息(我们统称为元知识)可以从事件的上下文中得出。结果我们设计了一种注释方案,用于丰富生物医学事件语料库的元知识。该方案是多维的,因为每个事件都针对可从事件的文本上下文派生的元知识的5个不同方面进行注释。还注释了用于确定值的文本线索。该方案旨在具有足够的通用性,以允许与不同类型的生物事件注释集成,同时又具有足够的细节以捕获文本中所表达的元知识本质中的重要细微差别。我们在此报告注释方案的主要功能,以及它在GENIA事件语料库中的应用(1000个摘要和36,858个事件)。注释者之间的协议达成了很高的水平,介于0.84-0.93 Kappa之间。结论通过使用元知识扩充事件注释,可以训练更复杂的IE系统,从而可以将解释性信息指定为搜索条件的一部分。这可以协助完成许多重要的任务,例如,寻找新的实验知识以促进数据库管理,使文本推断能够检测出蕴含和矛盾等。据我们所知,在元数据的多样性方面,我们的方案是唯一的-为每个事件注释的知识方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号