...
首页> 外文期刊>Machine Learning >LoRAS: an oversampling approach for imbalanced datasets
【24h】

LoRAS: an oversampling approach for imbalanced datasets

机译:loras:用于非衡度数据集的过采样方法

获取原文
获取原文并翻译 | 示例
           

摘要

The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for the analysis of imbalanced datasets. It is known that SMOTE frequently over-generalizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine Shadowsampling (LoRAS) to oversample from an approximated data manifold of the minority class. We benchmarked our algorithm with 14 publicly available imbalanced datasets using three different Machine Learning (ML) algorithms and compared the performance of LoRAS, SMOTE and several SMOTE extensions that share the concept of using convex combinations of minority class data points for oversampling with LoRAS. We observed that LoRAS, on average generates better ML models in terms of F1-Score and Balanced accuracy. Another key observation is that while most of the extensions of SMOTE we have tested, improve the F1-Score with respect to SMOTE on an average, they compromise on the Balanced accuracy of a classification model. LoRAS on the contrary, improves both F1 Score and the Balanced accuracy thus produces better classification models. Moreover, to explain the success of the algorithm, we have constructed a mathematical framework to prove that LoRAS oversampling technique provides a better estimate for the mean of the underlying local data distribution of the minority class data space.
机译:合成少数群体过采样技术(SMOTE)广泛用于分析不平衡数据集。众所周知,宣称经常过度概括少数阶级,导致多数阶级的错误分类,并影响模型的整体平衡。在本文中,我们提出了一种克服了越纪念的方法,这些方法是从少数类别的近似数据歧管中采用局部随机仿射色调采样(Loras)。我们使用三种不同的机器学习(ML)算法与14个公共可用的不平衡数据集进行基准测试,并比较了Loras,Smote和几个少数次扫描的性能,这些扩展与少数群体数据点的概念与Loras过采样。我们观察到Loras,平均而言,在F1分数和平衡的准确性方面产生更好的ML模型。另一个关键观察是,虽然我们已经测试的大部分缩小的缩小,但是在平均值的情况下,改善F1分数,以平均缩小,它们妥协了对分类模型的平衡准确性。洛拉相反,改善了F1得分,因此均衡的准确性产生了更好的分类模型。此外,为了解释算法的成功,我们构建了一种数学框架,以证明Loras过采样技术为少数群体数据空间的底层本地数据分布的平均值提供了更好的估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号