首页> 外文期刊>ACM transactions on Asian language information processing >Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning
【24h】

Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning

机译:通过多语言学习提高低资源语言中的NER标签性能

获取原文
获取原文并翻译 | 示例

摘要

Existing supervised solutions for Named Entity Recognition (NER) typically rely on a large annotated corpus. Collecting large amounts of NER annotated corpus is time-consuming and requires considerable human effort. However, collecting small amounts of annotated corpus for any language is feasible, but the performance degrades due to data sparsity. We address the data sparsity by borrowing features from the data of a closely related language. We use hierarchical neural networks to train a supervised NER system. The feature borrowing from a closely related language happens via the shared layers of the network. The neural network is trained on the combined dataset of the low-resource language and a closely related language, also termed Multilingual Learning. Unlike existing systems, we share all layers of the network between the two languages. We apply multilingual learning for NER in Indian languages and empirically show the benefits over a monolingual deep learning system and a traditional machine-learning system with some feature engineering. Using multilingual learning, we show that the low-resource language NER performance increases mainly due to (1) increased named entity vocabulary, (2) cross-lingual subword features, and (3) multilingual learning playing the role of regularization.
机译:现有的命名实体识别(NER)的受监督解决方案通常依赖于大型带注释的语料库。收集大量带有NER标记的语料库很耗时,并且需要大量的人工。但是,为任何语言收集少量带注释的语料库是可行的,但是由于数据稀疏性,性能会下降。我们通过借鉴紧密相关语言的数据中的特征来解决数据稀疏性。我们使用分层神经网络来训练监督的NER系统。从紧密相关的语言中借用的功能是通过网络的共享层发生的。在低资源语言和密切相关的语言(也称为多语言学习)的组合数据集上训练神经网络。与现有系统不同,我们在两种语言之间共享网络的所有层。我们为印度语言的NER应用多语言学习,并通过经验证明了其优于单语言深度学习系统和具有某些功能工程的传统机器学习系统的优势。使用多语言学习,我们发现低资源语言NER的性能提高主要是由于(1)命名实体词汇量的增加,(2)跨语言子词的功能以及(3)多语言学习起着正则化的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号