Multi-Domain Active Learning for Text Classification

Lianghao Li; Xiaoming Jin; Sinno Jialin Pan; Jian-Tao Sun

首页> 外文期刊>SIGKDD explorations >Multi-Domain Active Learning for Text Classification

【24h】

Multi-Domain Active Learning for Text Classification

机译：文本分类的多域主动学习

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Active learning has been proven to be effective in reducing labeling efforts for supervised learning. However, existing active learning work has mainly focused on training models for a single domain. In practical applications, it is common to simultaneously train classifiers for multiple domains. For example, some merchant web sites (like Amazon.com) may need a set of classifiers to predict the sentiment polarity of product reviews collected from various domains (e.g., electronics, books, shoes). Though different domains have their own unique features, they may share some common latent features. If we apply active learning on each domain separately, some data instances selected from different domains may contain duplicate knowledge due to the common features. Therefore, how to choose the data from multiple domains to label is crucial to further reducing the human labeling efforts in multi-domain learning. In this paper, we propose a novel multi-domain active learning framework to jointly select data instances from all domains with duplicate information considered. In our solution, a shared subspace is first learned to represent common latent features of different domains. By considering the common and the domain specific features together, the model loss reduction induced by each data instance can be decomposed into a common part and a domain-specific part. In this way, the duplicate information across domains can be encoded into the common part of model loss reduction and taken into account when querying. We compare our method with the state-of-the-art active learning approaches on several text classification tasks: sentiment classification, newsgroup classification and email spam filtering. The experiment results show that our method reduces the human labeling efforts by 33.2%, 42.9% and 68.7% on the three tasks, respectively.

机译：主动学习已被证明可以有效减少监督学习的标签工作。但是，现有的主动学习工作主要集中在单个领域的培训模型上。在实际应用中，通常同时训练多个域的分类器。例如，某些商家网站（例如Amazon.com）可能需要一组分类器，以预测从各个领域（例如，电子，书籍，鞋子）收集的产品评论的情感极性。尽管不同的域具有自己的独特功能，但它们可能共享一些共同的潜在功能。如果我们分别在每个领域上应用主动学习，由于共同的功能，从不同领域中选择的某些数据实例可能包含重复的知识。因此，如何从多个域中选择要标记的数据对于进一步减少多域学习中的人工标记工作至关重要。在本文中，我们提出了一种新颖的多域主动学习框架，可以从所有域中共同选择具有重复信息的数据实例。在我们的解决方案中，首先学习了一个共享子空间来表示不同域的共同潜在特征。通过共同考虑通用特征和特定领域特征，可以将每个数据实例引起的模型损失减少分解为通用部分和特定领域部分。通过这种方式，跨域的重复信息可以被编码为减少模型损失的公共部分，并在查询时予以考虑。我们在几种文本分类任务上将我们的方法与最新的主动学习方法进行了比较：情感分类，新闻组分类和电子邮件垃圾邮件过滤。实验结果表明，我们的方法在这三个任务上分别减少了33.2％，42.9％和68.7％的人工标注工作量。

著录项

来源
《SIGKDD explorations 》 |2012年第cdarom期| 共9页
作者
Lianghao Li; Xiaoming Jin; Sinno Jialin Pan; Jian-Tao Sun;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类 TP274.2;
关键词
Active Learning; Transfer Learning; Text Classification;

机译：主动学习;转移学习;文本分类;

相似文献

外文文献
中文文献
专利

1. Multi-Domain Active Learning for Text Classification [J] . Lianghao Li, Xiaoming Jin, Sinno Jialin Pan, SIGKDD explorations . 2012 ,第CDaROM期

机译：文本分类的多域主动学习
2. Active learning in automated text classification: a case study exploring bias in predicted model performance metrics [J] . Arun Varghese, Tao Hong, Chelsea Hunter, The environmentalist . 2019 ,第3期

机译：主动学习在自动文本分类中的案例研究：探索预测的模型性能指标中的偏差
3. Active learning in automated text classification: a case study exploring bias in predicted model performance metrics [J] . Arun Varghese, Tao Hong, Chelsea Hunter, Environment systems & decisions . 2019 ,第3期

机译：自动文本分类中的主动学习：探讨预测模型性能度量的案例研究
4. Multi-Domain Active Learning for Text Classification [C] . Lianghao Li, Xiaoming Jin, Sinno Jialin Pan, ACM SIGKDD international conference on knowledge discovery and dataMining . 2012

机译：文本分类的多域主动学习
5. Functional classification of divergent protein sequences and molecular evolution of multi-domain proteins. [D] . Strope, Pooja K. 2011

机译：差异蛋白序列的功能分类和多域蛋白的分子进化。
6. Active learning for clinical text classification: is it better than random sampling? [O] . Rosa L Figueroa, Qing Zeng-Treitler, Long H Ngo, 2012

机译：主动学习进行临床文本分类：比随机抽样更好吗？
7. Multi-domain active learning for text classification [O] . Lianghao Li, Xiaoming Jin, Sinno Jialin, 2012

机译：用于文本分类的多域主动学习

Multi-Domain Active Learning for Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅