Generalisation in named entity recognition: A quantitative analysis

Isabelle Augenstein; Leon Derczynski; Kalina Bontcheva

首页> 外文期刊>Computer speech and language >Generalisation in named entity recognition: A quantitative analysis

【24h】

Generalisation in named entity recognition: A quantitative analysis

机译：命名实体识别中的泛化：定量分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named Entity Recognition (NER) is a key NLP task, which is all the more challenging on Web and user-generated content with their diverse and continuously changing language. This paper aims to quantify how this diversity impacts state-of-the-art NER methods, by measuring named entity (NE) and context variability, feature sparsity, and their effects on precision and recall. In particular, our findings indicate that NER approaches struggle to generalise in diverse genres with limited training data. Unseen NEs, in particular, play an important role, which have a higher incidence in diverse genres such as social media than in more regular genres such as newswire. Coupled with a higher incidence of unseen features more generally and the lack of large training corpora, this leads to significantly lower F1 scores for diverse genres as compared to more regular ones. We also find that leading systems rely heavily on surface forms found in training data, having problems generalising beyond these, and offer explanations for this observation.

机译：命名实体识别（NER）是NLP的一项关键任务，对于网络和用户生成的内容以及其不断变化的语言，这更具挑战性。本文旨在通过测量命名实体（NE）和上下文变异性，特征稀疏性及其对精度和召回率的影响，来量化这种多样性如何影响最新的NER方法。特别是，我们的研究结果表明，NER方法难以在训练数据有限的情况下推广到各种体裁中。尤其是看不见的NE发挥了重要作用，与社交媒体等更常规的流派相比，社交媒体等各种流派的NE发生率更高。再加上更普遍的看不见特征的发生率较高，以及缺少大型训练语料库，与较常规的F1分数相比，这导致显着降低F1分数。我们还发现，领先的系统严重依赖于训练数据中发现的表面形式，存在超出这些范围的一般性问题，并为此观察提供了解释。

著录项

来源
《Computer speech and language》 |2017年第7期|61-83|共23页
作者
Isabelle Augenstein; Leon Derczynski; Kalina Bontcheva;
展开▼
作者单位

University of Sheffield, Sheffield, S14DP, UK;

University of Sheffield, Sheffield, S14DP, UK;

University of Sheffield, Sheffield, S14DP, UK;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Natural language processing; Information extraction; Named entity recognition; Generalisation; Entity drift; Social media; Quantitative study;

机译：自然语言处理;信息提取;命名实体识别;泛化;实体漂移;社交媒体;定量研究;
入库时间 2022-08-18 02:11:09

相似文献

外文文献
中文文献
专利

1. Myanmar named entity corpus and its use in syllable-based neural named entity recognition [J] . Hsu Myat Mo, Khin Mar Soe International Journal of Electrical and Computer Engineering . 2020,第2期

机译：缅甸名为实体语料库及其在基于音节的神经名为实体识别中的用途
2. Named entity recognition goes to old regime France: geographic text analysis for early modern French corpora [J] . McDonough Katherine, Moncla Ludovic, van de Camp Matje International Journal of Geographical Information Science . 2019,第11a12期

机译：名为实体识别前往旧制度法国：早期现代法国语料库的地理文本分析
3. Effective integration of morphological analysis and named entity recognition based on a recurrent neural network [J] . Lee Hyeon-gu, Park Geonwoo, Kim Harksoo Pattern recognition letters . 2018,第SEPa1期

机译：基于递归神经网络的形态分析和命名实体识别的有效集成
4. Quantitative Analysis of Art Market Using Ontologies, Named Entity Recognition and Machine Learning: A Case Study [C] . Dominik Filipiak, Henning Agt-Rickaue, Christian Hentschel, International conference on business information systems . 2016

机译：使用本体，命名实体识别和机器学习对艺术市场进行定量分析：一个案例研究
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition [O] . Wangjin Lee, Jinwook Choi 2019

机译：前体诱导的条件随机场：通过诱导连接单独的实体以改善临床命名实体的识别
7. Generalisation in Named Entity Recognition: A Quantitative Analysis [O] . Augenstein, Isabelle, Derczynski, Leon, Bontcheva, Kalina 2017

机译：命名实体识别中的推广：定量分析

Generalisation in named entity recognition: A quantitative analysis

摘要

著录项

相似文献

相关主题

期刊订阅