Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization

Fuzhen Zhuang; Ping Luo; Hui Xiong; Qing He; Yuhong Xiong; Zhongzhi Shi

首页> 外文期刊>Statistical Analysis and Data Mining >Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization

【24h】

Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization

机译：利用Word群集和文档类之间的关联进行跨域文本分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross-domain text categorization targets on adapting the knowledge learnt from a labeled source domain to an unlabeled target domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw-word features, the associations between word clusters (conceptual features) and document classes may remain stable across different domains. In this paper, we exploit these unchanged associations as the bridge of knowledge transformation from the source domain to the target domain by the non-negative matrix tri-factorization. Specifically, we formulate a joint optimization framework of the two matrix tri-factorizations for the source- and target-domain data, respectively, in which the associations between word clusters and document classes are shared between them. Then, we give an iterative algorithm for this optimization and theoretically show its convergence. The comprehensive experiments show the effectiveness of this method. In particular, we show that the proposed method can deal with some difficult scenarios where baseline methods usually do not perform well.

机译：跨域文本分类的目标是使从标记源域中学习到的知识适应未标记目标域，其中来自源域和目标域的文档来自不同的分布。但是，尽管原始单词特征的分布不同，但是单词簇（概念特征）与文档类别之间的关联可能在不同域中保持稳定。在本文中，我们将这些不变的关联作为通过非负矩阵三因子分解将知识从源域转换为目标域的桥梁。具体来说，我们针对源域和目标域数据分别制定了两个矩阵三因子分解的联合优化框架，其中词簇与文档类之间的关联在它们之间共享。然后，给出了用于该优化的迭代算法，并从理论上证明了其收敛性。综合实验证明了该方法的有效性。特别是，我们证明了所提出的方法可以处理基线方法通常不能很好执行的一些困难情况。

著录项

来源
《Statistical Analysis and Data Mining》 |2011年第1期|共15页
作者
Fuzhen Zhuang; Ping Luo; Hui Xiong; Qing He; Yuhong Xiong; Zhongzhi Shi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类经济统计学;
关键词
Cross-domain learning; Domain adaption; Transfer learning; Text categorization;

机译：跨域学习;域自适应;转移学习;文本分类;
入库时间 2022-08-18 15:14:16

相似文献

外文文献
中文文献
专利

1. Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization [J] . Fuzhen Zhuang, Ping Luo, Hui Xiong, Statistical Analysis and Data Mining . 2011,第1期

机译：利用Word群集和文档类之间的关联进行跨域文本分类
2. Distributional Word Clusters vs. Words for Text Categorization [J] . Bekkerman Ron, El-Yaniv Ran, Tishby Naftali, Journal of machine learning research . 2003,第Mar期

机译：分布式词簇与用于文本分类的词
3. Enhanced cross-domain document clustering with a semantically enhanced text stemmer (SETS) [J] . Ivan Stankov, Diman Todorov, Rossitza Setchi International journal of knowledge-based and intelligent engineering systems . 2013,第2期

机译：使用语义增强的文本词干分析器（SETS）增强的跨域文档聚类
4. Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization [C] . Fuzhen Zhuang, Ping Luo, Hui Xiong, SIAM International Conference on Data Mining . 2010

机译：用于跨域文本分类的单词群集和文档类之间的关联
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. The TREC 2004 genomics track categorization task: classifying full text biomedical documents [O] . Aaron M Cohen, William R Hersh 2006

机译：TREC 2004基因组学跟踪分类任务：对全文生物医学文献进行分类
7. Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization [O] . Fuzhen Zhuang, Ping Luo, Hui Xiong, 2011

机译：利用Word群集和文档类之间的关联进行跨域文本分类

Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅