A two-phase hybrid of semi-supervised and active learning approach for sequence labeling

Hamed Hassanzadeh; Mohammadreza Keyvanpour

首页> 外文期刊>Intelligent data analysis >A two-phase hybrid of semi-supervised and active learning approach for sequence labeling

【24h】

A two-phase hybrid of semi-supervised and active learning approach for sequence labeling

机译：半监督和主动学习两阶段混合的序列标记方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In recent years, many NLP systems and tasks are developed using machine learning methods. In order to achieve the best performance, these systems are generally trained on a large human annotated corpus. Since annotating such corpora is a very expensive and time-consuming procedure, manually annotating corpora is become one of the significant issues in many text based tasks such as text mining, semantic annotation, Named Entity Recognition and generally Information Extraction. Semi-supervised Learning and Active Learning are two distinct approaches that deal with reduction of labeling costs. Based on their natures, Active and semi-supervised learning can produce better results when they are jointly applied. In this paper we propose a combined Semi-Supervised and Active Learning approach for Sequence Labeling which extremely reduces manual annotation cost in a way that only highly uncertain tokens need to be manually labeled and other sequences and subsequences are labeled automatically. The proposed approach reduces manual annotation cost around 90% compare with a supervised learning and 30% in contrast with a similar fully active learning approach. Conditional Random Field (CRF) is chosen as the underlying learning model due to its promising performance in many sequence labeling tasks. In addition we proposed a confidence measure based on the model's variance reduction that reaches a considerable accuracy for finding informative samples.

机译：近年来，使用机器学习方法开发了许多NLP系统和任务。为了获得最佳性能，通常会在大型的带人类注释的语料库上训练这些系统。由于注释此类语料库是非常昂贵且耗时的过程，因此手动注释语料库已成为许多基于文本的任务（例如，文本挖掘，语义注释，命名实体识别和一般的信息提取）中的重要问题之一。半监督学习和主动学习是两种降低标签成本的独特方法。根据其性质，主动学习和半监督学习可以在结合使用时产生更好的结果。在本文中，我们提出了一种用于序列标记的半监督和主动学习相结合的方法，该方法极大地降低了手动注释的成本，其方式是只需要手动标记高度不确定的标记并自动标记其他序列和子序列。与有监督的学习相比，拟议的方法将人工注释成本降低了约90％，与类似的完全主动学习方法相比，降低了30％。由于条件随机场（CRF）在许多序列标记任务中表现良好，因此被选作基础学习模型。此外，我们提出了一种基于模型方差减少的置信度度量，该置信度度量在查找信息样本时达到了相当高的准确性。

著录项

来源
《Intelligent data analysis》 |2013年第2期|251-270|共20页
作者
Hamed Hassanzadeh; Mohammadreza Keyvanpour;
展开▼
作者单位

Young Researchers Club, Qazvin Branch, Islamic Azad University, Qazvin, Iran;

Department of Computer Engineering, Alzahra University, Tehran, Iran;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Active learning; semi-supervised learning; sequence labeling; named entity recognition;

机译：主动学习;半监督学习;序列标记;命名实体识别;

相似文献

外文文献
中文文献
专利

1. Transductive active learning - A new semi-supervised learning approach based on iteratively refined generative models to capture structure in data [J] . Reitmaier Tobias, Calma Adrian, Sick Bernhard Information Sciences: An International Journal . 2015,第Null期

机译：过渡式主动学习-一种基于迭代细化生成模型的新半监督学习方法，可捕获数据结构
2. Semi-supervised multi-label classification through topological active learning [J] . Benyettou Abdelkader, Bennani Younès, Benyettou Abdelkader, International Journal on Communications Antenna and Propagation . 2017,第3期

机译：通过拓扑主动学习进行半监督多标签分类
3. A robust approach to model-based classification based on trimming and constraints Semi-supervised learning in presence of outliers and label noise [J] . Advances in data analysis and classification . 2020,第2期

机译：基于修剪和限制的基于模型分类的强大方法，在异常因素和标签噪声存在半监督学习
4. Semi-Supervised Active Learning for Sequence Labeling [C] . Katrin Tomanek, Udo Hahn Joint conference of the annual meeting of the Association for Computational Linguistics;International joint conference on natural language processing of the Asian Federation of Natural Languages Processing;ACL 2009;IJCNLP 2009 . 2009

机译：序列标记的半监督主动学习
5. Learning from partially labeled data: Unsupervised and semi-supervised learning on graphs and learning with distribution shifting. [D] . Huang, Jiayuan. 2007

机译：从部分标记的数据中学习：在图上进行无监督和半监督学习，并通过分布转移进行学习。
6. Active Semi-Supervised Learning Method with Hybrid Deep Belief Networks [O] . Shusen Zhou, Qingcai Chen, Xiaolong Wang -1

机译：混合深度信念网络的主动半监督学习方法
7. Semi-Supervised Active Learning for Sequence Labeling [O] . Katrin Tomanek, Udo Hahn 2010

机译：序列标记的半监督主动学习

A two-phase hybrid of semi-supervised and active learning approach for sequence labeling

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅