首页> 外文会议>Nordic conference of computational Linguistics >Projecting named entity recognizers without annotated or parallel corpora
【24h】

Projecting named entity recognizers without annotated or parallel corpora

机译:投影没有注释或并行语料库的命名实体识别器

获取原文

摘要

Named entity recognition (NER) is a task extensively researched in the field of NLP. NER typically requires large annotated corpora for training usable models. This is a problem for languages which lack large annotated corpora, such as Finnish. We propose an approach to create a named entity recognizer for Finnish by leveraging preexisting strong NER models for English, with no manually annotated data and no parallel corpora. We automatically gather a large amount of chronologically matched data in the two languages, then project named entity annotations from the English documents onto the Finnish ones, by resolving the matches with simple linguistic rules. We use this 'artificially' annotated data to train a BiLSTM-CRF NER model for Finnish. Our results show that this method can produce annotated instances with high precision, and the resulting model achieves state-of-the-art performance.
机译:命名实体识别(NER)是NLP领域中广泛研究的任务。 NER通常需要大型带注释的语料库来训练可用模型。对于缺少大型带注释语料库的语言(例如芬兰语),这是一个问题。我们提出一种方法,以利用英语现有的强大NER模型来创建芬兰语的命名实体识别器,该模型不需要人工注释的数据,也没有并行的语料库。我们会自动收集两种语言按时间顺序匹配的大量数据,然后通过使用简单的语言规则解决匹配项,将英文文档中的命名实体注释投影到芬兰文档中。我们使用此“人工”注释数据为芬兰语训练BiLSTM-CRF NER模型。我们的结果表明,该方法可以高精度地生成带注释的实例,并且生成的模型可以实现最新的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号