首页> 外文会议>International joint conference on natural language processing >WiNER: A Wikipedia Annotated Corpus for Named Entity Recognition
【24h】

WiNER: A Wikipedia Annotated Corpus for Named Entity Recognition

机译:WiNER:用于命名实体识别的维基百科注释语料库

获取原文

摘要

We revisit the idea of mining Wikipedia in order to generate named-entity annotations. We propose a new methodology that we applied to the English Wikipedia to build WiNER, a large, high quality, annotated corpus. We evaluate its usefulness on 6 NER tasks, comparing 4 popular state-of-the art approaches. We show that lstm-crf is the approach that benefits the most from our corpus. We report impressive gains with this model when using a small portion of WiNER on top of the CONLL training material. Last, we propose a simple but efficient method for exploiting the full range of WiNER, leading to further improvements.
机译:我们重新审视了挖掘Wikipedia的想法,以生成命名实体注释。我们提出了一种适用于英语维基百科的新方法,用于构建WiNER(大型,高质量,带注释的语料库)。我们比较了4种流行的最新方法,评估了它在6个NER任务中的有用性。我们证明了lstm-crf是从我们的语料库中受益最大的方法。当在CONLL培训材料上使用一小部分WiNER时,我们报告此模型取得了令人瞩目的成就。最后,我们提出了一种简单而有效的方法来利用WiNER的全部范围,从而带来进一步的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号