首页> 外文会议>International Conference on Communication Systems and Network Technologies >Hellinger distance based oversampling method to solve multi-class imbalance problem
【24h】

Hellinger distance based oversampling method to solve multi-class imbalance problem

机译:基于Helling距离的过采样方法解决多级失衡问题

获取原文

摘要

Classification is a popular technique used to predict group membership for data samples in datasets. A multi-class or multinomial classification is the problem of classifying instances into more than two classes. With the emerging technology, the complexity of multi-class data has also increased thereby leading to class imbalance problem. With an imbalanced dataset, a machine learning algorithm can not make an accurate prediction. Therefore, in this paper Hellinger distance based oversampling method has been proposed. It is useful in balancing the datasets so that minority class can be identified with high accuracy without affecting accuracy of majority class. New synthetic data is generated using this method to achieve balance ratio. Testing has been done on five benchmark datasets using two standard classifiers KNN and C4.5. The evaluation matrix on precision, recall and fmeasure are drawn for two standard classification algorithms. It is observed that Hellinger distance reduces risk of overlapping and skewness of data. Obtained results show increase of 20% in classification accuracy compared to classification of imbalance multi-class dataset.
机译:分类是一种流行的技术,用于预测数据集中的数据样本的组成员资格。多类或多项式分类是将实例分类为两个以上的类的问题。利用新兴技术,多级数据的复杂性也增加了导致阶级不平衡问题。通过不平衡数据集,机器学习算法无法进行准确的预测。因此,在本文中,已经提出了基于Helling距离的过采样方法。这是在平衡数据集,使少数类能够高精度地识别,而不会影响多数类的精度是有用的。使用该方法产生新的合成数据以实现平衡比。使用两个标准分类器KNN和C4.5在五个基准数据集中进行了测试。为两个标准分类算法绘制了精确,召回和造型的评估矩阵。据观察,Hellinger距离降低了数据重叠和偏差的风险。与不平衡多类数据集的分类相比,获得的结果显示了分类准确性的增加20 %。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号