Domain Adaptation for Text Categorization by Feature Labeling

机译：通过特征标签对文本进行分类的域自适应

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel approach to domain adaptation for text categorization, which merely requires that the source domain data are weakly annotated in the form of labeled features. The main advantage of our approach resides in the fact that labeling words is less expensive than labeling documents. We propose two methods, the first of which seeks to minimize the divergence between the distributions of the source domain, which contains labeled features, and the target domain, which contains only unlabeled data. The second method augments the labeled features set in an unsupervised way, via the discovery of a shared latent concept space between source and target. We empirically show that our approach outperforms standard supervised and semi-supervised methods, and obtains results competitive to those reported by state-of-the-art domain adaptation methods, while requiring considerably less supervision.

机译：我们提出了一种用于文本分类的域自适应新方法，该方法仅要求源域数据以标记特征的形式进行弱注释。我们方法的主要优点在于，标记单词比标记文档便宜。我们提出了两种方法，第一种方法试图使包含标记特征的源域和仅包含未标记数据的目标域的分布之间的差异最小。第二种方法是通过发现源和目标之间共享的潜在概念空间，以无监督的方式增强标记的特征集。我们凭经验表明，我们的方法优于标准的监督和半监督方法，并获得了与最新领域适应方法报告的结果相比具有竞争力的结果，而所需监督却少得多。

著录项

来源
《Advances in information retrieval》|2011年|p.424-435|共12页
会议地点 Dublin(IE);Dublin(IE)
作者
Cristina Kadar; Jose Iria;
展开▼
作者单位

IBM Research Zurich, Saumerstrasse 4, CH-8804 Riischlikon, Switzerland;

IBM Research Zurich, Saumerstrasse 4, CH-8804 Riischlikon, Switzerland;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词
domain adaptation; generalized expectation criteria; weakly-supervised latent dirichlet allocation.;

机译：领域适应；广义的期望标准；弱监督的潜在狄利克雷分配。;
入库时间 2022-08-26 13:47:04

相似文献

外文文献
中文文献
专利

1. Irrelevant attributes and imbalanced classes in multi-label text-categorization domains [J] . Sareewan Dendamrongvit, Peerapon Vateekul, Miroslav Kubat Intelligent data analysis . 2011,第6期

机译：多标签文本分类域中不相关的属性和不平衡的类
2. Memetic feature selection for multilabel text categorization using label frequency difference [J] . Lee Jaesung, Yu Injun, Park Jaegyun, Information Sciences: An International Journal . 2019,第期

机译：使用标签频率差异的多标签文本分类的迭代特征选择
3. Feature ranking for enhancing boosting-based multi-label text categorization [J] . Al-Salemi Bassam, Ayob Masri, Noah Shahrul Azman Mohd Expert Systems with Application . 2018,第DECa期

机译：功能分级，以增强基于增强的多标签文本分类
4. Undersampling Approach for Imbalanced Training Sets and Induction from Multi-label Text-Categorization Domains [C] . Sareewan Dendamrongvit, Miroslav Kubat Conference on Knowledge Discovery and Data Mining . 2010

机译：非标签文本分类域的不平衡训练集和诱导的欠采样方法
5. Induction in hierarchical multi-label domains with focus on text categorization. [D] . Dendamrongvit, Sareewan. 2011

机译：归纳多层标签域，重点关注文本分类。
6. Domain adaptation for semantic role labeling of clinical text [O] . Yaoyun Zhang, Buzhou Tang, Min Jiang, 2015

机译：用于临床文本语义角色标记的域自适应
7. Multi-label Text Categorization with Joint Learning Predictions-as-Features Method [O] . Li Li, Baobao Chang, Shi Zhao, 2015

机译：具有联合学习预测的多标签文本分类 - 特征方法

Domain Adaptation for Text Categorization by Feature Labeling

摘要

著录项

相似文献

相关主题

期刊订阅