The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification

机译：使用未标记的数据与标记的数据来停止主动学习以进行文本分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Annotation of training data is the major bottleneck in the creation of text classification systems. Active learning is a commonly used technique to reduce the amount of training data one needs to label. A crucial aspect of active learning is determining when to stop labeling data. Three potential sources for informing when to stop active learning are an additional labeled set of data, an unlabeled set of data, and the training data that is labeled during the process of active learning. To date, no one has compared and contrasted the advantages and disadvantages of stopping methods based on these three information sources. We find that stopping methods that use unlabeled data are more effective than methods that use labeled data.

机译：训练数据的注释是创建文本分类系统的主要瓶颈。主动学习是减少人们需要标注的训练数据量的一种常用技术。主动学习的一个关键方面是确定何时停止标记数据。通知何时停止主动学习的三个潜在来源是附加的标记数据集，未标记的数据集以及在主动学习过程中标记的训练数据。迄今为止，还没有人比较和对比基于这三种信息源的停止方法的优缺点。我们发现，使用未标记数据的停止方法比使用已标记数据的方法更有效。

著录项

来源
《IEEE International Conference on Semantic Computing》|2019年|287-294|共8页
会议地点
作者
Garrett Beatty; Ethan Kochis; Michael Bloodgood;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Training data; Labeling; Data models; Support vector machines; Computer science; Machine learning algorithms;

机译：培训;训练数据;标签;数据模型;支持向量机;计算机科学;机器学习算法;
入库时间 2022-08-26 13:53:16

相似文献

外文文献
中文文献
专利

1. Learning model order from labeled and unlabeled data for partially supervised classification, with application to word sense disambiguation [J] . Zheng-Yu Niu, Dong-Hong Ji, Chew Lim Tan Computer speech and language . 2007,第4期

机译：从标记和未标记的数据中学习模型顺序以进行部分监督分类，并应用于词义消歧
2. Bidirectional Active Learning: A Two-Way Exploration Into Unlabeled and Labeled Data Set [J] . Zhang Xiao-Yu, Wang Shupeng, Yun Xiaochun Neural Networks and Learning Systems, IEEE Transactions on . 2015,第12期

机译：双向主动学习：对未标记和标记数据集的双向探索
3. Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms [J] . Mohammad Reza Keyvanpour, Maryam Bahojb Imani Intelligent data analysis . 2013,第3期

机译：半监督文本分类：使用集成学习算法开发未标记的数据
4. The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification [C] . Garrett Beatty, Ethan Kochis, Michael Bloodgood IEEE International Conference on Semantic Computing . 2019

机译：使用未标记的数据与标记数据，用于停止主动学习文本分类
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals [O] . Hamed Hassanzadeh, Mahnoosh Kholghi, Anthony Nguyen, 2018

机译：跨医院使用标记和未标记数据的临床文件分类
7. The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification [O] . Garrett Beatty, Ethan Kochis, Michael Bloodgood 2019

机译：使用未标记的数据与标记数据，用于停止主动学习文本分类
8. Using Unlabeled Data to Improve Text Classification [R] . Nigam, K. P. 2001

机译：使用未标记的数据改进文本分类

The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅