首页> 外文会议>International conference on language resources and evaluation >Iterative Refinement and Quality Checking of Annotation Guidelines -How to Deal Effectively with Semantically Sloppy Named Entity Types, such as Pathological Phenomena
【24h】

Iterative Refinement and Quality Checking of Annotation Guidelines -How to Deal Effectively with Semantically Sloppy Named Entity Types, such as Pathological Phenomena

机译:注释准则的反复细化和质量检查-如何有效处理语义上草率命名的实体类型,例如病理现象

获取原文

摘要

We here discuss a methodology for dealing with the annotation of semantically hard to delineate, i.e., sloppy, named entity types. To illustrate sloppiness of entities, we treat an example from the medical domain, namely pathological phenomena. Based on our experience with iterative guideline refinement we propose to carefully characterize the thematic scope of the annotation by positive and negative coding lists and allow for alternative, short vs. long mention span annotations. Short spans account for canonical entity mentions (e.g., standardized disease names), while long spans cover descriptive text snippets which contain entity-specific elaborations (e.g., anatomical locations, observational details, etc.). Using this stratified approach, evidence for increasing annotation performance is provided by re-based inter-annotator agreement measurements over several, iterative annotation rounds using continuously refined guidelines. The latter reflects the increasing understanding of the sloppy entity class both from the perspective of guideline writers and users (annotators). Given our data, we have gathered evidence that we can deal with sloppiness in a controlled manner and expect inter-annotator agreement values around 80% for PathoJen, the pathological phenomena corpus currently under development in our lab.
机译:我们在这里讨论用于处理语义上难以描述的注解(即草率的命名实体类型)的注释的方法。为了说明实体的草率,我们以医学领域为例,即病理现象。基于我们在迭代准则细化方面的经验,我们建议通过正负编码列表来仔细表征注释的主题范围,并允许使用替代性的,简短的与长篇幅的注释。短跨度解释了规范的实体提及(例如,标准化的疾病名称),而长跨度解释了包含特定于实体的细节(例如,解剖位置,观察细节等)的描述性文本片段。使用这种分层方法,可以通过使用不断完善的准则在多个迭代式注释回合中重新基于注释者之间的一致性度量来提供提高注释性能的证据。后者从准则编写者和用户(注释者)的角度反映了对草率实体类的日益了解。根据我们的数据,我们已经收集了证据,表明我们可以以可控的方式处理草率现象,并期望PathoJen(我们实验室中当前正在开发的病理现象语料库)的注释者之间的协议值大约为80%。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号