首页> 外文会议>European Conference on IR Research >Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization
【24h】

Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization

机译:命名实体识别中的上下文嵌入:关于泛化的实证研究

获取原文

摘要

Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context. This is intuitively useful for generalization, especially in Named-Entity Recognition where it is crucial to detect mentions never seen during training. However, standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions. In this paper, we perform an empirical analysis of the generalization capabilities of state-of-the-art contextualized embeddings by separating mentions by novelty and with out-of-domain evaluation. We show that they are particularly beneficial for unseen mentions detection, especially out-of-domain. For models trained on CoNLL03, language model contextualization leads to a +1.2% maximal relative micro-Fl score increase in-domain against + 13% out-of-domain on the WNUT dataset.
机译:上下文化嵌入使用无监督语言模型预训练来根据单词上下文来计算单词表示形式。这对于一般化很有用,特别是在命名实体识别中,这对于检测训练中从未见过的提及至关重要。但是,由于训练和测试提及之间不切实际的词汇重叠,标准的英语基准测试高估了词汇相对于上下文特征的重要性。在本文中,我们通过新颖性将提及分开并进行域外评估,对最新的上下文化嵌入的泛化能力进行了实证分析。我们表明,它们对于看不见的提及检测(尤其是域外)特别有用。对于在CoNLL03上训练的模型,语言模型的语境化导致WNUT数据集的域内最大相对micro-Fl得分增加+ 1.2%,而域外+ 13%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号