首页> 外文会议>Nordic conference of computational Linguistics >Towards High Accuracy Named Entity Recognition for Icelandic
【24h】

Towards High Accuracy Named Entity Recognition for Icelandic

机译:面向冰岛语的高精度命名实体识别

获取原文

摘要

We report on work in progress which consists of annotating an Icelandic corpus for named entities (NEs) and using it for training a named entity recognizer based on a Bidirectional Long Short-Term Memory model. Currently, we have annotated 7,538 NEs appearing in the first 200,000 tokens of a 1 million token corpus, MIM-GOLD, originally developed for serving as a gold standard for part-of-speech tagging. Our best performing model, trained on this subset of MIM-GOLD, and enriched with external word embeddings, obtains an overall F_1 score of 81.3% when categorizing NEs into the following four categories: persons, locations, organizations and miscellaneous. Our preliminary results are promising, especially given the fact that 80% of MIM-GOLD has not yet been used for training.
机译:我们报告正在进行的工作,该工作包括为命名实体(NE)注释冰岛语料库,并使用它来基于双向长期短期记忆模型训练命名实体识别器。目前,我们已经注释了7538个网元,它们出现在100万个令牌语料库MIM-GOLD的前200,000个令牌中,MIM-GOLD最初是用作词性标记的黄金标准。我们的最佳模型经过MIM-GOLD子集的训练,并经过外部单词嵌入,在将NE分类为以下四个类别时,获得的总体F_1分数为81.3%:人员,位置,组织和其他。我们的初步结果令人鼓舞,尤其是考虑到80%的MIM-GOLD尚未用于培训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号