首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Active Learning with Deep Pre-trained Models for Sequence Tagging of Clinical and Biomedical Texts
【24h】

Active Learning with Deep Pre-trained Models for Sequence Tagging of Clinical and Biomedical Texts

机译:主动学习与深度预训练的模型,用于临床和生物医学文本的序列标记

获取原文

摘要

Active learning is a technique that helps to minimize the annotation budget required for the creation of a labeled dataset while maximizing the performance of a model trained on this dataset. It has been shown that active learning can be successfully applied to sequence tagging tasks of text processing in conjunction with deep learning models even when a limited amount of labeled data is available. Recent advances in transfer learning methods for natural language processing based on deep pre-trained models such as ELMo and BERT offer a much better ability to generalize on small annotated datasets compared to their shallow counterparts. The combination of deep pre-trained models and active learning leads to a powerful approach to dealing with annotation scarcity. In this work, we investigate the potential of this approach on clinical and biomedical data. The experimental evaluation shows that the combination of active learning and deep pre-trained models outperforms the standard methods of active learning. We also suggest a modification to a standard uncertainty sampling strategy and empirically show that it could be beneficial for annotation of very skewed datasets. Finally, we propose an annotation tool empowered with active learning and deep pre-trained models that could be used for entity annotation directly from Jupyter IDE.
机译:主动学习是一种有助于最小化创建标记数据集所需的注释预算,同时最大化在此数据集上训练的模型的性能的技术。研究表明,即使只有有限数量的标记数据,主动学习也可以与深度学习模型一起成功地应用于文本处理的序列标记任务。与浅层注释数据集相比,基于深度预训练模型(例如ELMo和BERT)的自然语言处理转移学习方法的最新进展提供了更好的归纳小型注释数据集的能力。深入的预训练模型与主动学习的结合导致了一种有效的方法来处理注释稀缺性。在这项工作中,我们将研究这种方法在临床和生物医学数据上的潜力。实验评估表明,主动学习和深度预训练模型的结合优于主动学习的标准方法。我们还建议对标准不确定性抽样策略进行修改,并根据经验表明,这对于注释非常偏斜的数据集可能是有益的。最后,我们提出了一种注释工具,该工具具有主动学习功能和预先训练好的深度模型,可直接从Jupyter IDE中进行实体注释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号