首页> 外文会议>IEEE/ACIS International Conference on Software Engineering Research, Management and Applications >Analysis of Focussed Under-Sampling Techniques with Machine Learning Classifiers
【24h】

Analysis of Focussed Under-Sampling Techniques with Machine Learning Classifiers

机译:具有机器学习分类器的聚焦下采样技术分析

获取原文

摘要

Class Imbalance Problem is the major issue in machine intelligence producing biased classifiers that work well for the majority class but have a relatively poor performance for the minority class. To ensure the development of accurate prediction models, it is essential to deal with the class imbalance problem. In this paper, the class imbalance problem is handled using focused undersampling techniques viz. Cluster Based, Tomek Link and Condensed Nearest Neighbours which equalize the number of instances of the two types of classes by undersampling the majority class based on some particular criteria. This is in contrast to random undersampling where the data samples are selected randomly from the majority class leading to underfitting and loss of some important datapoints. To fairly compare and evaluate the performance of focused undersampling approaches, prediction models are constructed using popular machine learning classifiers like K-Nearest Neighbor, Decision Tree and Naive Bayes. The results have shown that Decision Tree outperformed other machine learning techniques. Comparing and contrasting the undersampling approaches for Decision Tree concluded Condensed Nearest Neighbours to be best amongst others.
机译:类不平衡问题是机器智能的主要问题,生产偏置分类器,对大多数阶级工作良好,但对少数阶级的表现相对较差。为确保开发准确的预测模型,必须处理班级不平衡问题。在本文中,使用聚焦的欠采样技术viz处理类别不平衡问题。基于群集的,Tomek链路和浓缩的最近邻居,它通过基于一些特定标准阐明了多数类来均衡两种类型类的实例数。这与随机欠采样形成对比,其中数据样本从大多数类中随机选择,导致一些重要的数据点的磨损和丢失。为了公平地比较和评估聚焦欠采样方法的性能,使用像K-Collect邻居,决策树和天真贝叶斯等流行的机器学习分类器构建预测模型。结果表明,决策树优于其他机器学习技术。比较和对比决策树的非采样方法结束了凝聚率最近的邻居在其他方面是最好的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号