首页> 外文会议>Ibero-American Conference on AI(IBERAMIA 2004); 20041122-26; Puebla(IT) >Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme
【24h】

Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme

机译:通过应用堆叠方案提高命名实体提取器的性能

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this paper we investigate the way of improving the performance of a Named Entity Extraction (NEE) system by applying machine learning techniques and corpus transformation. The main resources used in our experiments are the publicly available tagger TnT and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We split the NEE task into two subtasks 1) Named Entity Recognition (NER) that involves the identification of the group of words that make up the name of an entity and 2) Named Entity Classification (NEC) that determines the category of a named entity. We have focused our work on the improvement of the NER task, generating four different taggers with the same training corpus and combining them using a stacking scheme. We improve the baseline of the NER task (F_(β=i) value of 81.84) up to a value of 88.37. When a NEC module is added to the NER system the performance of the whole NEE task is also improved. A value of 70.47 is achieved from a baseline of 66.07.
机译:在本文中,我们研究了通过应用机器学习技术和语料库转换来提高命名实体提取(NEE)系统性能的方法。我们的实验中使用的主要资源是公开可用的标记器TnT和西班牙语文本的语料库,其中用BIO标签标记了命名实体的出现。我们将NEE任务分为两个子任务:1)命名实体识别(NER),涉及识别组成实体名称的一组单词; 2)命名实体分类(NEC),用于确定命名实体的类别。我们将工作重点放在了NER任务的改进上,用相同的训练语料生成了四个不同的标记器,并使用堆叠方案将它们组合在一起。我们将NER任务的基线(F_(β= i)值为81.84)提高到88.37。当将NEC模块添加到NER系统时,整个NEE任务的性能也会得到改善。从66.07的基线获得70.47的值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号