首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction
【24h】

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

机译:预测注释难度以改善生物医学信息提取的任务路由和模型性能

获取原文

摘要

Modern NLP systems require high-quality annotated data. In specialized domains, expert annotations may be prohibitively expensive. An alternative is to rely on crowdsourcing to reduce costs at the risk of introducing noise. In this paper we demonstrate that directly modeling instance difficulty can be used to improve model performance, and to route instances to appropriate annotators. Our difficulty prediction model combines two learned representations: a 'universal' encoder trained on out-of-domain data, and a task-specific encoder. Experiments on a complex biomedical information extraction task using expert and lay annotators show that: (ⅰ) simply excluding from the training data instances predicted to be difficult yields a small boost in performance; (ⅱ) using difficulty scores to weight instances during training provides further, consistent gains; (ⅲ) assigning instances predicted to be difficult to domain experts is an effective strategy for task routing. Our experiments confirm the expectation that for specialized tasks expert annotations are higher quality than crowd labels, and hence preferable to obtain if practical. Moreover, augmenting small amounts of expert data with a larger set of lay annotations leads to further improvements in model performance.
机译:现代的NLP系统需要高质量的带注释的数据。在专业领域,专家注释的价格可能过高。另一种选择是依靠众包来降低成本,但有引入噪声的风险。在本文中,我们证明了直接建模实例难度可用于改善模型性能,以及将实例路由到适当的注释者。我们的难度预测模型结合了两种学习的表示形式:在域外数据上经过训练的“通用”编码器和特定于任务的编码器。使用专家注释和外行注释符进行的复杂生物医学信息提取任务的实验表明:(ⅰ)仅从训练数据中排除预测为困难的实例,对性能的提升不大; (ⅱ)在训练过程中使用难度评分权重实例,可以进一步获得一致的收益; (ⅲ)分配预计难以为领域专家服务的实例是一种有效的任务路由策略。我们的实验证实了对特殊任务的期望,即专家批注的质量比人群标注的质量更高,因此,如果可行的话,更可取。此外,使用大量的行家注释集增加少量专家数据会导致模型性能的进一步提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号