A Low-Cost Named Entity Recognition Research Based on Active Learning

Han Huang; Hongyu Wang; Dawei Jin

首页> 外文期刊>Scientific programming >A Low-Cost Named Entity Recognition Research Based on Active Learning

【24h】

A Low-Cost Named Entity Recognition Research Based on Active Learning

机译：基于主动学习的低成本命名实体识别研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named entity recognition (NER) is an indispensable and very important part of many natural language processing technologies, such as information extraction, information retrieval, and intelligent Q & A. This paper describes the development of the AL-CRF model, which is a NER approach based on active learning (AL). The algorithmic sequence of the processes performed by the AL-CRF model is the following first, the samples are clustered using the k-means approach. Then, stratified sampling is performed on the produced clusters in order to obtain initial samples, which are used to train the basic conditional random field (CRF) classifier. The next step includes the initiation of the selection process which uses the criterion of entropy. More specifically, samples having the highest entropy values are added to the training set. Afterwards, the learning process is repeated, and the CRF classifier is retrained based on the obtained training set. The learning and the selection process of the AL is running iteratively until the harmonic mean F stabilizes and the final NER model is obtained. Several NER experiments are performed on legislative and medical cases in order to validate the AL-CRF performance. The testing data include Chinese judicial documents and Chinese electronic medical records (EMRs). Testing indicates that our proposed algorithm has better recognition accuracy and recall rate compared to the conventional CRF model. Moreover, the main advantage of our approach is that it requires fewer manually labelled training samples, and at the same time, it is more effective. This can result in a more cost effective and more reliable process.

机译：命名实体识别（NER）是许多自然语言处理技术（例如信息提取，信息检索和智能问答）中必不可少且非常重要的部分。本文介绍了NER的AL-CRF模型的开发基于主动学习（AL）的方法。首先由AL-CRF模型执行的过程的算法顺序如下，使用k均值方法对样本进行聚类。然后，对产生的簇进行分层采样以获得初始样本，这些初始样本用于训练基本条件随机场（CRF）分类器。下一步包括启动使用熵准则的选择过程。更具体地说，将具有最高熵值的样本添加到训练集中。之后，重复学习过程，并基于获得的训练集对CRF分类器进行再训练。 AL的学习和选择过程将反复进行，直到谐波平均值F稳定并获得最终的NER模型为止。为了验证AL-CRF的性能，对立法和医疗案件进行了多次NER实验。测试数据包括中国司法文件和中国电子病历（EMR）。测试表明，与常规CRF模型相比，我们提出的算法具有更好的识别准确性和召回率。此外，我们方法的主要优势在于，它需要的人工标记训练样本更少，同时更有效。这可以导致更具成本效益和更可靠的过程。

著录项

来源
《Scientific programming》 |2018年第1期|共页
作者
Han Huang; Hongyu Wang; Dawei Jin;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A Low-Cost Named Entity Recognition Research Based on Active Learning [J] . Huang Han, Wang Hongyu, Jin Dawei Scientific programming . 2018,第PTa2期

机译：基于主动学习的低成本命名实体识别研究
2. A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields [J] . Van Cuong Tran, Ngoc Thanh Nguyen, Fujita Hamido, Knowledge-Based Systems . 2017,第sepa15期

机译：主动学习和自学习的结合，使用条件随机字段在Twitter上进行命名实体识别
3. Combining Self Learning and Active Learning for Chinese Named Entity Recognition* [J] . Lin Yao1, Chengjie Sun2, Xiaolong Wang1, Journal of software . 2010,第5期

机译：自主学习与主动学习相结合的中文命名实体识别*
4. Subsequence Based Deep Active Learning for Named Entity Recognition [C] . Puria Radmard, Yassir Fathullah, Aldo Lipani Annual Meeting of the Association for Computational Linguistics;International Joint Conference on Natural Language Processing . 2021

机译：基于后期主动学习的命名实体识别
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. An active learning-enabled annotation system for clinical named entity recognition [O] . Yukun Chen, Thomas A. Lask, Qiaozhu Mei, 2017

机译：用于临床命名实体识别的可主动学习的注释系统
7. A Low-Cost Named Entity Recognition Research Based on Active Learning [O] . Han Huang, Hongyu Wang, Dawei Jin 2018

机译：基于主动学习的低成本命名实体识别研究

A Low-Cost Named Entity Recognition Research Based on Active Learning

摘要

著录项

相似文献

相关主题

期刊订阅