Optimizing the Features of CRF-based Named Entity Recognition for Patent Documents

机译：优化基于CRF的命名实体识别专利文献的特征

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a feature optimization of named-entity recognition (NER) system for patent documents that automatically recognizes named entities such as product names, service names, and technology names that are topic words in patent description sentences. Named entity recognition problem can be approached as a kind of tagging problem in which NER tags are attached to the named entities in a given sentence. We tried to apply a machine learning model CRF that is widely accepted as an excellent method to a tagging problem. We constructed an NER-tagged patent corpus in which 668,260 tokens (approximately 15,000 sentences) are included for training data and 74,248 tokens for test data. We evaluated each CRF features and tried to find an optimal CRF feature set. The accuracy of our NER system was 93.69% with an F1 score 65.40.

机译：本文提出了一个专利文档的特征优化，用于专利文档，它自动识别专利说明句子中的主题单词的产品名称，服务名称和技术名称等命名实体。命名实体识别问题可以接近作为一种标记问题，其中ner标签附加到给定句子中的命名实体。我们试图应用机器学习模型CRF，广泛被广泛被接受为标记问题的优秀方法。我们构建了一个人标记的专利语料库，其中包括668,260令牌（约15,000个句子）用于训练数据和74,248个令牌进行测试数据。我们评估了每个CRF功能，并试图找到最佳CRF功能集。我们的NER系统的准确性为93.69％，F1得分65.40。

著录项

来源
《International Conference on Automation Information》|2014年||共4页
会议地点
作者
TAE-SEOK LEE; SEUNG-SHIK KANG;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP2-53;
关键词
Named-entity recognition; Tagged patent corpus; Optimal CRF features; Machine learning; Base-NP chunking; CRF++;

机译：命名实体识别;标记专利语料库;最佳CRF功能;机器学习;Base-NP Chunking;CRF ++;

相似文献

外文文献
中文文献
专利

1. Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1 [J] . ?eker G?khan Ak?n, Eryi?it Gül?en Semantic web . 2017,第5期

机译：扩展基于CRF的命名实体识别模型，用于土耳其良好的文本和用户生成的content1
2. Domain-specific Named Entity Recognition with Document-Level Optimization [J] . Wang Limin, Li Shoushan, Yan Qian, ACM transactions on Asian language information processing . 2018,第4期

机译：具有文档级优化的特定于域的命名实体识别
3. Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning [J] . Hua Xu, Hui Chen, Jingqi Wang, Database . 2016,第2010期

机译：通过领域知识和无监督特征学习来识别专利中的化学命名实体
4. Optimizing the Features of CRF-based Named Entity Recognition for Patent Documents [C] . TAE-SEOK LEE, SEUNG-SHIK KANG International Conference on Automation Information . 2014

机译：优化基于CRF的命名实体识别专利文献的特征
5. Advancing Biomedical Named Entity Recognition with Multivariate Feature Selection and Semantically Motivated Features. [D] . Leaman, James Robert, Jr. 2013

机译：具有多元特征选择和语义动机特征的生物医学命名实体识别。
6. Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning [O] . Yaoyun Zhang, Jun Xu, Hui Chen, 2016

机译：通过领域知识和无监督特征学习来识别专利中的化学命名实体
7. Arabic named entity recognition using optimized feature sets [O] . Yassine Benajiba, Mona Diab, Paolo Rosso 2008

机译：使用优化特征集的阿拉伯语命名实体识别

Optimizing the Features of CRF-based Named Entity Recognition for Patent Documents

摘要

著录项

相似文献

相关主题

期刊订阅