首页> 外文会议>IEEE International Conference on Semantic Computing >The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification
【24h】

The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification

机译:使用未标记的数据与标记的数据来停止主动学习以进行文本分类

获取原文

摘要

Annotation of training data is the major bottleneck in the creation of text classification systems. Active learning is a commonly used technique to reduce the amount of training data one needs to label. A crucial aspect of active learning is determining when to stop labeling data. Three potential sources for informing when to stop active learning are an additional labeled set of data, an unlabeled set of data, and the training data that is labeled during the process of active learning. To date, no one has compared and contrasted the advantages and disadvantages of stopping methods based on these three information sources. We find that stopping methods that use unlabeled data are more effective than methods that use labeled data.
机译:训练数据的注释是创建文本分类系统的主要瓶颈。主动学习是减少人们需要标注的训练数据量的一种常用技术。主动学习的一个关键方面是确定何时停止标记数据。通知何时停止主动学习的三个潜在来源是附加的标记数据集,未标记的数据集以及在主动学习过程中标记的训练数据。迄今为止,还没有人比较和对比基于这三种信息源的停止方法的优缺点。我们发现,使用未标记数据的停止方法比使用已标记数据的方法更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号