【24h】

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

机译:命名实体识别,具有小型强烈标记和大弱标记的数据

获取原文

摘要

Weak supervision has shown promising results in many natural language processing tasks. such as Named Entity Recognition (NER). Existing work mainly focuses on learning deep NER models only with weak supervision, i.e., without any human annotation, and shows that by merely using weakly labeled data, one can achieve good performance, though still under-performs fully supervised NER with manually/strongly labeled data. In this paper, we consider a more practical scenario, where we have both a small amount of strongly labeled data and a large amount of weakly labeled data. Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data. To address this issue, we propose a new multi-stage computational framework - NEEDLE with three essential ingredients: (1) weak label completion. (2) noise-aware loss function, and (3) final fine-tuning over the strongly labeled data. Through experiments on E-commerce query NER and Biomedical NER, we demonstrate that NEEDLE can effectively suppress the noise of the weak labels and outperforms existing methods. In particular, we achieve new SOTA F1-scores on 3 Biomedical NER datascts: BC5CDR-chem 93.74, BC5CDR-disease 90.69, NCBI-disease 92.28.
机译:弱监管显示了许多自然语言处理任务的有希望的结果。如命名实体识别(ner)。现有的工作主要专注于学习深层型号,只有弱监管,即没有任何人类注释,并且表明通过仅使用弱标记的数据,可以实现良好的性能,但仍然在手动/强烈标记的完全监督的内行数据。在本文中,我们考虑了更实际的场景,在那里我们拥有少量强烈标记的数据和大量弱标记的数据。不幸的是,我们观察到弱标记的数据并不一定改善,甚至甚至恶化模型性能(由于弱标签中的广泛噪音)在我们通过强烈标记和弱标记的数据的简单或加权组合中培训深层模型时。为了解决这个问题,我们提出了一种新的多阶段计算框架 - 针头有三种必需成分:(1)弱标签完成。 (2)噪声感知丢失功能,(3)在强标记数据上进行最终微调。通过对电子商务查询网和生物医学的实验,我们证明针可以有效地抑制弱标签的噪音和优于现有方法。特别是,我们在3个生物医学NER Datascts上实现了新的Sota F1分数:BC5CDR-Chem 93.74,BC5CDR-Dission 90.69,NCBI-Distress 92.28。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号