首页> 外文会议>IEEE International Conference on Machine Learning and Applications >Predictive Models with Resampling: A Comparative Study of Machine Learning Algorithms and their Performances on Handling Imbalanced Datasets
【24h】

Predictive Models with Resampling: A Comparative Study of Machine Learning Algorithms and their Performances on Handling Imbalanced Datasets

机译:带重采样的预测模型:机器学习算法及其在处理不平衡数据集上的性能的比较研究

获取原文

摘要

Class imbalance is a problem of crucial challenge in many real-world machine learning applications. Traditional machine learning algorithms are likely to produce good accuracy scores on such datasets due to an obvious bias towards the majority class. Thus, accuracy as a measure of performance for algorithms working on imbalanced data is not very clearly defined since the classifier has poor predictive accuracy over the minority class. While previous work has used several resampling techniques to aid in improving the predictive accuracy of the minority class, in this study, we explore and compare the effectiveness of the Synthetic Minority Oversampling and Random Oversampling techniques over multiple learning algorithms and resampling ratios for eight different performance measures against two datasets from diverse domains such as medicine and engineering. The results of this study show that the effectiveness of these resampling techniques is a multivariate function relative to both the learning algorithms and the resampling ratios, as well as the coherent characteristics of datasets. The choice of performance measures to evaluate models built using these resampling techniques also vary, thus giving us more relevant information useful for future research and applications.
机译:在许多现实世界的机器学习应用程序中,类不平衡是一个关键挑战的问题。传统的机器学习算法由于对多数类别的明显偏见,可能会在此类数据集上产生良好的准确性得分。因此,由于分类器相对于少数类具有较差的预测准确性,因此无法非常明确地定义作为衡量不平衡数据的算法性能的一种度量标准。虽然先前的工作使用了几种重采样技术来帮助提高少数族裔类别的预测准确性,但在本研究中,我们探索并比较了综合少数群体过采样和随机过采样技术在多种学习算法和八种不同性能的重采样率上的有效性。针对来自医学和工程学等不同领域的两个数据集进行测量。这项研究的结果表明,相对于学习算法和重采样率以及数据集的相关特性,这些重采样技术的有效性是一个多元函数。用于评估使用这些重采样技术构建的模型的性能度量的选择也有所不同,因此为我们提供了更多有用的信息,这些信息对将来的研究和应用很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号