Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting

机译：使用袋装和自适应升压处理多标签文本分类中的不平衡数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Imbalanced dataset is occurred due to uneven distribution of data available in the real world such as disposition of complaints on government offices in Bandung. Consequently, multi-label text categorization algorithms may not produce the best performance because classifiers tend to be weighed down by the majority of the data and ignore the minority. In this paper, Bagging and Adaptive Boosting algorithms are employed to handle the issue and improve the performance of text categorization. The result is evaluated with four evaluation metrics such as hamming loss, subset accuracy, example-based accuracy and micro-averaged f-measure. Bagging.ML-LP with SMO weak classifier is the best performer in terms of subset accuracy and example-based accuracy. Bagging.ML-BR with SMO weak classifier has the best micro-averaged f-measure among all. In other hand, AdaBoost.MH with J48 weak classifier has the lowest hamming loss value. Thus, both algorithms have high potential in boosting the performance of text categorization, but only for certain weak classifiers. However, bagging has more potential than adaptive boosting in increasing the accuracy of minority labels.

机译：由于现实世界中可用的数据分布不均衡，因此发生了不平衡的数据集，例如在万隆的政府办公室的投诉处置。因此，多标签文本分类算法可能不会产生最佳性能，因为分类器倾向于被大多数数据称重并忽略少数群体。在本文中，采用袋装和自适应升压算法来处理问题并提高文本分类的性能。结果是用四个评估度量评估，例如汉明损失，子集精度，基于示例性的精度和微平均法测量。 Bagging.ML-LP与SMO弱分类器是在子集准确性和基于示例性准确性方面的最佳表演者。 Bagging.ML-BR与SMO弱分类器有最好的微平均F测量。另一方面，adaboost.mh与J48弱分类器具有最低的汉明损失值。因此，这两种算法都具有很高的潜力，可以提高文本分类的性能，而是仅用于某些弱分类器。然而，袋装比增加少数群体标签的准确性更具适应性提升的潜力。

著录项

来源
《International Conference on Electrical Engineering and Informatics》|2015年||共6页
会议地点
作者
Winata Genta Indra; Khodra Masayu Leylia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类电气化、电能应用;
关键词
Bagging; Boosting; Classification algorithms; Measurement; Prediction algorithms; Text categorization; Training; adaptive boosting; bagging; imbalanced dataset; multi-label text categorization;

机译：袋装;提升;分类算法;测量;预测算法;文本分类;培训;自适应提升;袋装;不平衡数据集;多标签文本分类;

相似文献

外文文献
中文文献
专利

1. Feature ranking for enhancing boosting-based multi-label text categorization [J] . Al-Salemi Bassam, Ayob Masri, Noah Shahrul Azman Mohd Expert Systems with Application . 2018,第DECa期

机译：功能分级，以增强基于增强的多标签文本分类
2. Boosting algorithms with topic modeling for multi-label text categorization: A comparative empirical study [J] . Bassam Al-Salemi, Mohd. Juzaiddin Ab Aziz, Shahrul Azman Noah Journal of Information Science . 2015,第5期

机译：基于主题建模的多标签文本分类增强算法：一项比较经验研究
3. Boosting multi-label hierarchical text categorization [J] . Andrea Esuli, Tiziano Fagni, Fabrizio Sebastiani Information retrieval . 2008,第4期

机译：促进多标签分层文本分类
4. Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting [C] . Winata Genta Indra, Khodra Masayu Leylia International Conference on Electrical Engineering and Informatics . 2015

机译：使用Bagging和Adaptive Boosting处理多标签文本分类中的不平衡数据集
5. Induction in hierarchical multi-label domains with focus on text categorization. [D] . Dendamrongvit, Sareewan. 2011

机译：归纳多层标签域，重点关注文本分类。
6. Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments [O] . Jaehun Bang, Taeho Hur, Dohyeong Kim, 2018

机译：自适应数据增强技术在情绪不平衡的小样本环境中提供强大的个性化语音情感
7. Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting [O] . Genta Indra Winata, Masayu Leylia Khodra 2015

机译：使用袋装和自适应升压处理多标签文本分类中的不平衡数据集

Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting

摘要

著录项

相似文献

相关主题

期刊订阅