...
首页> 外文期刊>Journal of biomedical informatics. >Applying active learning to assertion classification of concepts in clinical text
【24h】

Applying active learning to assertion classification of concepts in clinical text

机译:将主动学习应用于临床文本中概念的断言分类

获取原文
获取原文并翻译 | 示例
           

摘要

Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC - 0.7715) than the passive learning method (random sampling) (ALC - 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort.
机译:用于临床自然语言处理(NLP)研究的监督式机器学习方法需要大量带注释的样本,由于医生的参与,这些样本的构建非常昂贵。主动学习是一种从大量资源中主动采样的方法,它提供了一种替代解决方案。它的主要分类目标是减少注释工作,同时保持预测模型的质量。但是,很少有研究调查其在临床NLP中的用途。本文报告了主动学习在临床文本分类任务中的应用:确定临床概念的断言状态。本研究使用了2010 i2b2 / VA Clinical NLP挑战中用于断言分类任务的带注释语料库。我们实施了几种现有的和新开发的主动学习算法,并评估了它们的用途。根据AUC的平均学习曲线下的面积(曲线下的面积)分数,在全局ALC分数中报告结果。结果表明,使用相同数量的带注释的样本时,主动学习策略比被动学习方法(随机采样)(ALC-0.7411)可以生成更好的分类模型(最佳ALC-0.7715)。而且,为了获得相同的分类性能,主动学习策略比随机抽样方法需要更少的样本。例如,要实现AUC为0.79,随机抽样方法使用了32个样本,而我们最好的主动学习算法仅需要12个样本,人工注释工作量减少了62.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号