【24h】

Robust Extraction of Named Entity Including Unfamiliar Word

机译:强大地提取名称实体,包括陌生词

获取原文

摘要

This paper proposes a novel method to extract named entities including unfamiliar words which do not occur or occur few times in a training corpus using a large unannotated corpus. The proposed method consists of two steps. The first step is to assign the most similar and familiar word to each unfamiliar word based on their context vectors calculated from a large unannotated corpus. After that, traditional machine learning approaches are employed as the second step. The experiments of extracting Japanese named entities from IREX corpus and NHK corpus show the effectiveness of the proposed method.
机译:本文提出了一种提取的新方法,用于提取包括不熟悉的单词的命名实体,这些单词不会使用大型未解压语料库在培训语料库中发生或发生在训练语料库中。所提出的方法包括两个步骤。第一步是基于从大型未解析的语料库计算的上下文向量来为每个不熟悉的单词分配最相似和熟悉的单词。之后,使用传统的机器学习方法作为第二步。从IREX语料库和NHK语料库中提取日本命名实体的实验表明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号