Projecting named entity recognizers without annotated or parallel corpora

机译：投影没有注释或并行语料库的命名实体识别器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named entity recognition (NER) is a task extensively researched in the field of NLP. NER typically requires large annotated corpora for training usable models. This is a problem for languages which lack large annotated corpora, such as Finnish. We propose an approach to create a named entity recognizer for Finnish by leveraging preexisting strong NER models for English, with no manually annotated data and no parallel corpora. We automatically gather a large amount of chronologically matched data in the two languages, then project named entity annotations from the English documents onto the Finnish ones, by resolving the matches with simple linguistic rules. We use this 'artificially' annotated data to train a BiLSTM-CRF NER model for Finnish. Our results show that this method can produce annotated instances with high precision, and the resulting model achieves state-of-the-art performance.

机译：命名实体识别（NER）是NLP领域中广泛研究的任务。 NER通常需要大型带注释的语料库来训练可用模型。对于缺少大型带注释语料库的语言（例如芬兰语），这是一个问题。我们提出一种方法，以利用英语现有的强大NER模型来创建芬兰语的命名实体识别器，该模型不需要人工注释的数据，也没有并行的语料库。我们会自动收集两种语言按时间顺序匹配的大量数据，然后通过使用简单的语言规则解决匹配项，将英文文档中的命名实体注释投影到芬兰文档中。我们使用此“人工”注释数据为芬兰语训练BiLSTM-CRF NER模型。我们的结果表明，该方法可以高精度地生成带注释的实例，并且生成的模型可以实现最新的性能。

著录项

来源
《Nordic conference of computational Linguistics》|2019年|232-241|共10页
会议地点 Turku(FI)
作者
Jue Hou; Maximilian W. Koppatz; Jose Maria Hoya Quecedo; Roman Yangarber;
展开▼
作者单位

University of Helsinki Department of Computer Science Finland;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 14:42:13

相似文献

外文文献
中文文献
专利

1. Comparison of Methods to Annotate Named Entity Corpora [J] . Komiya Kanako, Suzuki Masaya, Iwakura Tomoya, ACM transactions on Asian language information processing . 2018,第4期

机译：命名实体语料库注释方法的比较
2. Generating Chinese named entity data from parallel corpora [J] . Ruiji FU, Bing QIN, Ting LIU Frontiers of computer science in China . 2014,第4期

机译：从并行语料库生成中文命名实体数据
3. Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources [J] . CHUN-JEN LEE, JASON S. CHANG, JYH-SHING R. JANG ACM transactions on Asian language information processing . 2006,第2期

机译：使用统计模型和多个知识源的平行语料库中双语实体的对齐
4. Projecting named entity recognizers without annotated or parallel corpora [C] . Jue Hou, Maximilian W. Koppatz, Jose Maria Hoya Quecedo, Nordic conference of computational Linguistics . 2019

机译：未经注释或平行语料的投影命名实体识别人员
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Assessment of disease named entity recognition on a corpus of annotated sentences [O] . Antonio Jimeno, Ernesto Jimenez-Ruiz, Vivian Lee, 2008

机译：在带注释句子的语料库上评估疾病命名实体识别
7. Learning a Unified Named Entity Tagger from Multiple Partially Annotated Corpora for Efficient Adaptation [O] . Xiao Huang, Li Dong, Elizabeth Boschee, 2019

机译：从多个部分注释的语料库学习一个统一的命名实体标记器以进行高效适应

Projecting named entity recognizers without annotated or parallel corpora

摘要

著录项

相似文献

相关主题

期刊订阅