Uncertainty query sampling strategies for active learning of named entity recognition task

Agrawal Ankit; Tripathi Sarsij; Vardhan Manu

首页> 外文期刊>Intelligent decision technologies >Uncertainty query sampling strategies for active learning of named entity recognition task

【24h】

Uncertainty query sampling strategies for active learning of named entity recognition task

机译：不确定性查询名称实体识别任务的主动学习的采样策略

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Active learning approach is well known method for labeling huge un-annotated dataset requiring minimal effort and is conducted in a cost efficient way. This approach selects and adds most informative instances to the training set iteratively such that the performance of learner improves with each iteration. Named entity recognition (NER) is a key task for information extraction in which entities present in sequences are labeled with correct class. The traditional query sampling strategies for the active learning only considers the final probability value of the model to select the most informative instances. In this paper, we have proposed a new active learning algorithm based on the hybrid query sampling strategy which also considers the sentence similarity along with the final probability value of the model and compared them with four other well known pool based uncertainty query sampling strategies based active learning approaches for named entity recognition (NER) i.e. least confident sampling, margin of confidence sampling, ratio of confidence sampling and entropy query sampling strategies. The experiments have been performed over three different biomedical NER datasets of different domains and a Spanish language NER dataset. We found that all the above approaches are able to reach to the performance of supervised learning based approach with much less annotated data requirement for training in comparison to that of supervised approach. The proposed active learning algorithm performs well and further reduces the annotation cost in comparison to the other sampling strategies based active algorithm in most of the cases.

机译：主动学习方法是众所周知的方法，用于标记需要最小努力的巨大未注释的数据集，并以成本有效的方式进行。这种方法选择并为训练中的大多数信息实例进行了迭代地设置，使得学习者的性能随着每次迭代而改善。命名实体识别（ner）是信息提取的关键任务，其中序列中存在的实体用正确的类标记。活动学习的传统查询采样策略仅考虑模型的最终概率值，以选择最具信息性的实例。在本文中，我们提出了一种基于混合查询采样策略的新的主动学习算法，该算法还考虑了模型的最终概率值以及与基于四个基于众所周知的池的不确定性查询查询采样策略的主动策略学习指定实体识别（NER）的方法，即最不自信的采样，置信范围，置信范围，置信比例和熵查询采样策略。在不同域的三种不同的生物医学网数据集和西班牙语语言数据集中进行了实验。我们发现，上述所有方法都能够达到基于监督的学习方法的性能，与监督方法相比，培训的培训较少的批量数据要求。所提出的主动学习算法表现良好，并进一步降低了与大多数情况下的其他采样策略的基于采样策略的注释成本。

著录项

来源
《Intelligent decision technologies》 |2021年第1期|99-114|共16页
作者
Agrawal Ankit; Tripathi Sarsij; Vardhan Manu;
展开▼
作者单位

Natl Inst Technol Raipur Dept Comp Sci & Engn Raipur Chhattisgarh India;

Motilal Nehru Natl Inst Technol Allahabad Dept Comp Sci & Engn Prayagraj Uttar Pradesh India;

Natl Inst Technol Raipur Dept Comp Sci & Engn Raipur Chhattisgarh India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Active learning; named entity recognition; uncertainty query sampling;

机译：主动学习;命名实体识别;不确定性查询采样;

相似文献

外文文献
中文文献
专利

1. PARTIAL ANNOTATION SCHEME FOR ACTIVE LEARNING ON NAMED ENTITY RECOGNITION TASKS [J] . KOGA KOBAYASHI, KEI WAKABAYASHI Journal of Data Intelligence . 2020,第3期

机译：用于任命实体识别任务的活动学习的部分注释方案
2. The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy [J] . Bo Xie, Guowei Shen, Chun Guo, Wireless communications & mobile computing . 2021,第a期

机译：使用主动学习策略的中国网络安全的命名实体识别
3. Combining Multi-task Learning with Transfer Learning for Biomedical Named Entity Recognition [J] . Tahir Mehmood, Alfonso E. Gerevini, Alberto Lavelli, Procedia Computer Science . 2020,第5期

机译：将多任务学习与生物医学命名实体识别的转移学习相结合
4. ZH-NER: Chinese Named Entity Recognition with Adversarial Multi-task Learning and Self-Attentions [C] . Peng Zhu, Dawei Cheng, Fangzhou Yang, International Conference on Database Systems for Advanced Applications . 2021

机译：Zh-ner：中国名称与对抗多任务学习和自我关注的实体识别
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. A neural network multi-task learning approach to biomedical named entity recognition [O] . Gamal Crichton, Sampo Pyysalo, Billy Chiu, 2017

机译：用于生物医学命名实体识别的神经网络多任务学习方法
7. The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy [O] . Bo Xie, Guowei Shen, Chun Guo, 2021

机译：使用主动学习策略的中国网络安全的命名实体识别

Uncertainty query sampling strategies for active learning of named entity recognition task

摘要

著录项

相似文献

相关主题

期刊订阅