首页> 外文会议>International joint conference on natural language processing >A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers
【24h】

A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

机译:一点注释有很多良好的贡献:在引导下的低资源名为实体识别器的研究

获取原文
获取外文期刊封面目录资料

摘要

Most state-of-the-art models for named entity recognition (NER) rely on the availability of large amounts of labeled data, making them challenging to extend to new, lower-resourced languages. However, there are now several proposed approaches involving either cross-lingual transfer learning, which learns from other highly resourced languages, or active learning, which efficiently selects effective training data based on model predictions. This paper poses the question: given this recent progress, and limited human annotation, what is the most effective method for efficiently creating high-quality entity recognizers in under-resourced languages? Based on extensive experimentation using both simulated and real human annotation, we find a dual-strategy approach best, starting with a cross-lingual transferred model, then performing targeted annotation of only uncertain entity spans in the target language, minimizing annotator effort. Results demonstrate that cross-lingual transfer is a powerful tool when very little data can be annotated, but an entity-targeted annotation strategy can achieve competitive accuracy quickly, with just one-tenth of training data. The code is publicly available here.~1
机译:用于命名实体识别(NER)的最先进的模型依赖于大量标记数据的可用性,使其具有挑战,扩展到新的低资源语言。但是,现在有几种涉及交叉转移学习的拟议方法,这些方法从其他高度资源的语言学习或主动学习,从而有效地选择基于模型预测的有效培训数据。本文提出了问题:鉴于这一最近的进步,有限的人类注释,有效地创建资源不足的语言的高质量实体识别人员是什么?基于同时使用模拟和真正的人类注释大量的实验,我们发现一个双重战略方法最好,从一个跨语种转移的模型,然后在目标语言仅执行不确定实体跨度有针对性的注释,注释最小化的努力。结果表明,当可以注释的数据很少的数据时,交叉舌转移是一个强大的工具,但实体目标注释策略可以快速实现竞争精度,只需十分之一的培训数据。该代码在这里公开提供。〜1

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号