Handling imbalanced classification problem: A case study on social media datasets

Nguyen Tuong Tri; Hwang Dosam; Jung Jason J.

首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Handling imbalanced classification problem: A case study on social media datasets

【24h】

Handling imbalanced classification problem: A case study on social media datasets

机译：处理不平衡的分类问题：社交媒体数据集的案例研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The imbalanced data problem occurs when the number of representative instances for classes of interest is much lower than for other classes. The influence of imbalanced data on classification performance has been discussed in some previous research as a challenge to be studied. In this paper, we propose a method to solve the imbalanced data problem by focusing on preprocessing, including: i) sampling techniques (i.e., under-sampling, over-sampling, and hybrid-sampling) and ii) the instance weighting method to increase the number of features in minority classes and to reduce comprehensive coverage in majority classes. The experimental results show that the noisy data is reduced, making a smaller sized dataset, and training time decreases significantly. Moreover, distinct properties of each class are examined effectively. Refined data is used as input for Naive Bayes and support vector machine classifiers for the targets of the training process. The proposed methods are evaluated based on the number of non-geotagged resources that are labeled correctly with their geo-locations. In comparison with previous research, the proposed method achieves accuracy of 84%, whereas previous results were 75%.

机译：当感兴趣类的代表实例的数量远低于其他类时，会发生不平衡的数据问题。在某些先前的研究中，在一些研究中讨论了对分类性能的影响对挑战进行了研究。在本文中，我们提出了一种通过专注于预处理，包括：i）采样技术（即，在取样，过度采样和混合采样）和II）增加的方法来解决方法来解决不平衡数据问题的方法少数民族课程的特征数量，并减少多数课程的全面覆盖。实验结果表明，噪声数据减少，制作较小的数据集，训练时间显着降低。此外，有效地检查每个类的不同性质。精细数据用作幼稚贝叶斯的输入，并支持培训过程目标的向量机分类器。所提出的方法是基于与其地理位置正确标记的非地理标记资源的数量来评估。与先前的研究相比，所提出的方法可实现84％的准确度，而以前的结果为75％。

著录项

来源
《Journal of intelligent & fuzzy systems: Applications in Engineering and Technology》 |2017年第2期|共12页
作者
Nguyen Tuong Tri; Hwang Dosam; Jung Jason J.;
展开▼
作者单位

Yeungnam Univ Dept Comp Engn Gyongsan South Korea;

Yeungnam Univ Dept Comp Engn Gyongsan South Korea;

Chung Ang Univ Dept Comp Engn Seoul South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统;
关键词
Imbalanced datasets; geotags resources; sampling method; instance weighting; location prediction;

机译：不平衡的数据集;地理标签资源;采样方法;实例加权;位置预测;
入库时间 2022-08-20 10:32:57

相似文献

外文文献
中文文献
专利

1. Handling imbalanced classification problem: A case study on social media datasets [J] . Nguyen Tuong Tri, Hwang Dosam, Jung Jason J. Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2017,第2期

机译：处理不平衡的分类问题：社交媒体数据集的案例研究
2. KerMinSVM for imbalanced datasets with a case study on arabic comics classification [J] . Ammar Nayal, Hadi Jomaa, Mariette Awad Engineering Applications of Artificial Intelligence . 2017,第mara期

机译：KerMinSVM用于不平衡数据集，以阿拉伯漫画分类为例
3. Behavior classification of goats using 9-axis multi sensors: The effect of imbalanced datasets on classification performance [J] . Sakai Koki, Oishi Kazato, Miwa Masafumi, Computers and Electronics in Agriculture . 2019,第期

机译：使用9轴多传感器的山羊的行为分类：不平衡数据集对分类性能的影响
4. Predictive Models with Resampling: A Comparative Study of Machine Learning Algorithms and their Performances on Handling Imbalanced Datasets [C] . Adithi D. Chakravarthy, Sindhura Bonthu, Zhengxin Chen, IEEE International Conference on Machine Learning and Applications . 2019

机译：带重采样的预测模型：机器学习算法及其在处理不平衡数据集上的性能的比较研究
5. Classifier design to improve pattern classification and knowledge discovery for imbalanced datasets. [D] . Wang, Kun. 2009

机译：分类器设计可改进模式分类和不平衡数据集的知识发现。
6. Convolutional Rebalancing Network for the Classification of Large Imbalanced Rice Pest and Disease Datasets in the Field [O] . Guofeng Yang, Guipeng Chen, Cong Li, 2021

机译：卷积性重新平衡网络用于分类大型不平衡水稻害虫和疾病数据集
7. A First Experimental Study on Functional Dependencies for Imbalanced Datasets Classification [O] . Marie Le Guilly, Jean-Marc Petit, Marian Scuturici 2019

机译：对不平衡数据集分类的功能依赖性的第一个试验研究

Handling imbalanced classification problem: A case study on social media datasets

摘要

著录项

相似文献

相关主题

期刊订阅