首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Handling imbalanced classification problem: A case study on social media datasets
【24h】

Handling imbalanced classification problem: A case study on social media datasets

机译:处理不平衡的分类问题:社交媒体数据集的案例研究

获取原文
获取原文并翻译 | 示例
       

摘要

The imbalanced data problem occurs when the number of representative instances for classes of interest is much lower than for other classes. The influence of imbalanced data on classification performance has been discussed in some previous research as a challenge to be studied. In this paper, we propose a method to solve the imbalanced data problem by focusing on preprocessing, including: i) sampling techniques (i.e., under-sampling, over-sampling, and hybrid-sampling) and ii) the instance weighting method to increase the number of features in minority classes and to reduce comprehensive coverage in majority classes. The experimental results show that the noisy data is reduced, making a smaller sized dataset, and training time decreases significantly. Moreover, distinct properties of each class are examined effectively. Refined data is used as input for Naive Bayes and support vector machine classifiers for the targets of the training process. The proposed methods are evaluated based on the number of non-geotagged resources that are labeled correctly with their geo-locations. In comparison with previous research, the proposed method achieves accuracy of 84%, whereas previous results were 75%.
机译:当感兴趣类的代表实例的数量远低于其他类时,会发生不平衡的数据问题。在某些先前的研究中,在一些研究中讨论了对分类性能的影响对挑战进行了研究。在本文中,我们提出了一种通过专注于预处理,包括:i)采样技术(即,在取样,过度采样和混合采样)和II)增加的方法来解决方法来解决不平衡数据问题的方法少数民族课程的特征数量,并减少多数课程的全面覆盖。实验结果表明,噪声数据减少,制作较小的数据集,训练时间显着降低。此外,有效地检查每个类的不同性质。精细数据用作幼稚贝叶斯的输入,并支持培训过程目标的向量机分类器。所提出的方法是基于与其地理位置正确标记的非地理标记资源的数量来评估。与先前的研究相比,所提出的方法可实现84%的准确度,而以前的结果为75%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号