首页> 外国专利> DOCUMENT-SPECIFIC GAZETTEERS FOR NAMED ENTITY RECOGNITION

DOCUMENT-SPECIFIC GAZETTEERS FOR NAMED ENTITY RECOGNITION

机译:指定实体识别的特定文档公报

摘要

A method for entity recognition employs document-level entity tags which correspond to mentions appearing in the document, without specifying their locations. A named entity recognition model is trained on features extracted from text samples tagged with document-level entity tags. A text document to be labeled is received, the text document being tagged with at least one document-level entity tag. A document-specific gazetteer is generated, based on the at least one document-level entity tag. The gazetteer includes a set of entries, one entry for each of a set of entity names. For a text sequence of the document, features for tokens of the text sequence are extracted. The features include document-specific features for tokens matching at least a part of the entity name of one of the gazetteer entries. Entity labels are predicted for the tokens in the text sequence with the named entity recognition model, based on the extracted features.
机译:用于实体识别的方法采用文档级实体标签,该标签对应于文档中出现的提及,而无需指定其位置。在从用文档级实体标签标记的文本样本中提取的特征上训练命名实体识别模型。接收要被标记的文本文档,该文本文档被至少一个文档级实体标签标记。基于至少一个文档级实体标签,生成一个文档特定的地名词典。地名词典包括一组条目,一组条目分别用于一组实体名称。对于文档的文本序列,提取文本序列标记的特征。这些特征包括针对令牌的特定于文档的特征,这些特征与地名词典条目之一的实体名称的至少一部分匹配。基于提取的特征,使用命名的实体识别模型为文本序列中的令牌预测实体标签。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号