首页> 外文OA文献 >External knowledge and query strategies in active learning: A study in clinical information extraction
【2h】

External knowledge and query strategies in active learning: A study in clinical information extraction

机译:主动学习中的外部知识和查询策略:临床信息提取研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that 1) amongst existing query strategies, the ones based on the classification model’s confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and 2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.
机译:本文提出了一种用于信息提取的新型主动学习查询策略,称为领域知识信息(DKI)。主动学习通常用于减少获得机器学习算法的训练数据所需的注释工作量。主动学习方法的关键组成部分是查询策略,该策略用于迭代选择要注释的样本。知识资源已用于信息提取中,作为一种方法来获取样本表示的其他特征。但是,DKI是第一个利用此类资源来告知样本选择的查询策略。为了评估DKI的优点,特别是在减少新查询策略可实现的注释工作方面,我们对主动学习查询策略在临床领域内进行信息提取进行了全面的实证比较。由于广泛的结构化知识资源的可用性而选择了临床领域进行这项工作,这些资源通常被用于特征生成。另外,由于与在该领域中获得注释相关的必要的高成本和障碍,临床领域为主动学习提供了一个引人注目的用例。我们的实验结果表明,1)在现有查询策略中,基于分类模型的置信度的策略是临床数据的较好选择,因为它们的性能相当好,但计算量却轻得多; 2)通过实现以下目的,可以显着减少注释工作量利用主动学习查询策略中的知识资源,与使用最新查询策略相比,手动注释的令牌和概念最多可减少14%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号