Ensemble multi-label text categorization based on rotation forest and latent semantic indexing

Elghazel Haytham; Aussem Alex; Gharroudi Ouadie; Saadaoui Wafa

首页> 外文期刊>Expert Systems with Application >Ensemble multi-label text categorization based on rotation forest and latent semantic indexing

【24h】

Ensemble multi-label text categorization based on rotation forest and latent semantic indexing

机译：基于旋转森林和潜在语义索引的多标签文本分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text categorization has gained increasing popularity in the last years due the explosive growth of multimedia documents. As a document can be associated with multiple non-exclusive categories simultaneously (e.g., Virus, Health, Sports, and Olympic Games), text categorization provides many opportunities for developing novel multi-label learning approaches devoted specifically to textual data. In this paper, we propose an ensemble multi-label classification method for text categorization based on four key ideas: (1) performing Latent Semantic Indexing based on distinct orthogonal projections on lower-dimensional spaces of concepts; (2) random splitting of the vocabulary; (3) document bootstrapping; and (4) the use of BoosTexter as a powerful multi-label base learner for text categorization to simultaneously encourage diversity and individual accuracy in the committee. Diversity of the ensemble is promoted through random splits of the vocabulary that leads to different orthogonal projections on lower-dimensional latent concept spaces. Accuracy of the committee members is promoted through the underlying latent semantic structure uncovered in the text. The combination of both rotation-based ensemble construction and Latent Semantic Indexing projection is shown to bring about significant improvements in terms of Average Precision, Coverage, Ranking loss and One error compared to five state-of-the-art approaches across 14 real-word textual data sets covering a wide variety of topics including health, education, business, science and arts. (C) 2016 Elsevier Ltd. All rights reserved.

机译：近年来，由于多媒体文档的爆炸性增长，文本分类已变得越来越流行。由于文档可以同时与多个非排他性类别（例如病毒，健康，体育和奥运会）相关联，因此文本分类为开发专门用于文本数据的新颖的多标签学习方法提供了许多机会。在本文中，我们基于四个关键思想提出了一种用于文本分类的整体多标签分类方法：（1）在概念的低维空间上基于不同的正交投影执行潜在语义索引；（2）词汇的随机分裂；（3）文件自举；（4）使用BoosTexter作为强大的多标签基础学习器进行文本分类，以同时鼓励委员会中的多样性和个人准确性。通过词汇的随机分裂来促进整体的多样性，这会导致在低维潜在概念空间上的正交投影不同。通过文本中揭示的潜在潜在语义结构，可以提高委员会成员的准确性。与基于14个真实单词的五种最新方法相比，基于旋转的整体结构与潜在语义索引投影的结合显示出在平均精度，覆盖率，排名损失和一个错误方面的显着改善涵盖广泛主题的文本数据集，包括健康，教育，商业，科学和艺术。（C）2016 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2016年第9期|1-11|共11页
作者
Elghazel Haytham; Aussem Alex; Gharroudi Ouadie; Saadaoui Wafa;
展开▼
作者单位

Univ Lyon 1, LIRIS UMR CNRS 5205, F-69622 Villeurbanne, France;

Univ Lyon 1, LIRIS UMR CNRS 5205, F-69622 Villeurbanne, France;

Univ Lyon 1, LIRIS UMR CNRS 5205, F-69622 Villeurbanne, France;

Univ Lyon 1, LIRIS UMR CNRS 5205, F-69622 Villeurbanne, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-label classification; Text categorization; Ensemble learning; Rotation forest; Content analysis and indexing;

机译：多标签分类文本分类集成学习旋转森林内容分析与索引;

相似文献

外文文献
中文文献
专利

1. A Two-Stage Feature Selection Method for Text Categorization by Using Category Correlation Degree and Latent Semantic Indexing [J] . WANG Fei, LI Cai-hong, WANG Jing-shan, 上海交通大学学报（英文版） . 2015,第001期
2. ENSEMBLE MULTI-LABEL TEXT CATEGORIZATION BASED ON PYRAMIDAL CLUSTER MEMBERSHIP APPROACH [J] . J. STALIN JOSE, DR. P. SURESH Journal of Theoretical and Applied Information Technology . 2017,第12期

机译：基于金字塔聚类成员方法的可封装多标签文本分类
3. ENSEMBLE MULTI-LABEL TEXT CATEGORIZATION BASED ON PYRAMIDAL CLUSTER MEMBERSHIP APPROACH [J] . J. STALIN JOSE, DR. P. SURESH Journal of Theoretical and Applied Information Technology . 2017,第12期

机译：基于金字塔聚类成员方法的可封装多标签文本分类
4. Class Selection Based Iterative Supervised Latent Semantic Indexing for Text Categorization [C] . Wang, Ming-Bo, Liu, Cheng-Lin International Conference on Information Engineering and Computer Science;ICIECS 2009 . 2009

机译：基于类选择的迭代监督潜在语义索引用于文本分类
5. Text clustering using latent semantic indexing. [D] . Gee, Kevin Randall. 2001

机译：使用潜在语义索引的文本聚类。
6. Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models [O] . Yannis Papanikolaou, Grigorios Tsoumakas, Manos Laliotis, 2017

机译：通过多标签分类模型的集成对生物医学文章进行大规模的在线语义索引
7. Genetic algorithm for text clustering based on latent semantic indexing [O] . Song Wei, Park Soon Cheol 2009

机译：基于潜在语义索引的文本聚类遗传算法
8. Similarity-Based Probability Model for Latent Semantic Indexing [R] . Ding, C. H. Q. 1999

机译：基于相似度的潜在语义索引概率模型

Ensemble multi-label text categorization based on rotation forest and latent semantic indexing

摘要

著录项

相似文献

相关主题

期刊订阅