首页> 外文会议>International Conference on Automation Information >Optimizing the Features of CRF-based Named Entity Recognition for Patent Documents
【24h】

Optimizing the Features of CRF-based Named Entity Recognition for Patent Documents

机译:优化基于CRF的命名实体识别专利文献的特征

获取原文

摘要

This paper proposes a feature optimization of named-entity recognition (NER) system for patent documents that automatically recognizes named entities such as product names, service names, and technology names that are topic words in patent description sentences. Named entity recognition problem can be approached as a kind of tagging problem in which NER tags are attached to the named entities in a given sentence. We tried to apply a machine learning model CRF that is widely accepted as an excellent method to a tagging problem. We constructed an NER-tagged patent corpus in which 668,260 tokens (approximately 15,000 sentences) are included for training data and 74,248 tokens for test data. We evaluated each CRF features and tried to find an optimal CRF feature set. The accuracy of our NER system was 93.69% with an F1 score 65.40.
机译:本文提出了一个专利文档的特征优化,用于专利文档,它自动识别专利说明句子中的主题单词的产品名称,服务名称和技术名称等命名实体。命名实体识别问题可以接近作为一种标记问题,其中ner标签附加到给定句子中的命名实体。我们试图应用机器学习模型CRF,广泛被广泛被接受为标记问题的优秀方法。我们构建了一个人标记的专利语料库,其中包括668,260令牌(约15,000个句子)用于训练数据和74,248个令牌进行测试数据。我们评估了每个CRF功能,并试图找到最佳CRF功能集。我们的NER系统的准确性为93.69%,F1得分65.40。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号