首页> 外文会议>Clinical natural language processing workshop >On the diminishing return of labeling clinical reports
【24h】

On the diminishing return of labeling clinical reports

机译:论标签临床报告的递减回报

获取原文

摘要

Ample evidence suggests that better machine learning models may be steadily obtained by training on increasingly larger datasets on natural language processing (NLP) problems from non-medical domains. Whether the same holds true for medical NLP has by far not been thoroughly investigated. This work shows that this is indeed not always the case. We reveal the somehow counter-intuitive observation that performant medical NLP models may be obtained with small amount of labeled data, quite the opposite to the common belief, most likely due to the domain specificity of the problem. We show quantitatively the effect of training data size on a fixed test set composed of two of the largest public chest x-ray radiology report datasets on the task of abnormality classification. The trained models not only make use of the training data efficiently, but also outperform the current state-of-the-art rule-based systems by a significant margin.
机译:充足的证据表明,通过培训来自非医学领域的自然语言处理(NLP)问题的越来越大的数据集,可以稳定地获得更好的机器学习模型。对于医疗NLP的同样的持有情况,迄今为止没有得到彻底调查。这项工作表明,这种情况确实并非总是如此。我们揭示了某种反向直观观察,即,可以用少量标记数据获得表演医疗NLP模型,与常见信念相反,很可能是由于问题的域特异性。我们在定量上显示了培训数据大小对由两个最大的公共胸部X射线放射学报告数据集组成的固定测试集​​的效果,这是异常分类任务的。训练有素的型号不仅有效地利用培训数据,而且还优于当前基于规则的基于规则的系统的显着边际。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号