首页> 外文会议>International conference on artificial neural networks >Imbalanced Data Classification Based on MBCDK-means Undersampling and GA-ANN
【24h】

Imbalanced Data Classification Based on MBCDK-means Undersampling and GA-ANN

机译:基于MBCDK均值欠采样和GA-ANN的不平衡数据分类

获取原文

摘要

The imbalanced classification problem is often a problem in classification tasks where one class contains a few samples while the other contains a great deal of samples. When the traditional machine learning classification method is applied to the imbalanced data set, the classification performance is bad and the time cost is high. As a result, mini batch with cluster distribution K-means (MBCDK-means) undersampling method and GA-ANN model is proposed in this paper to solve these two problems. MBCDK-means chooses the samples according to the clusters distribution and the distance from the majority class clusters to the minority class cluster center. This technology can keep the original distribution of cluster and increase the sampling rate of boundary samples. It is helpful to improve the final classification performance. At the same time, compared with the classic K-means clustering undersampling method, the presented MBCDK-means undersampling method has lower time complexity. Artificial neural network (ANN) is widely used in data classification but it is easily trapped in a local minimum. Genetic algorithm artificial neural network (GA-ANN), which uses genetic algorithm to optimize the weight and bias of neural network, is raised because of this. GA-ANN achieves better performance than ANN. Experimental results on 8 data sets show the effectiveness of the proposed algorithm.
机译:不平衡分类问题通常是分类任务中的一个问题,其中一类包含少量样本,而另一类包含大量样本。传统的机器学习分类方法应用于不平衡数据集时,分类性能较差,时间成本较高。因此,本文提出了带有簇分布K均值(MBCDK-means)欠采样方法和GA-ANN模型的微型批处理方法来解决这两个问题。 MBCDK-means根据聚类分布和多数类聚类到少数类聚类中心的距离来选择样本。该技术可以保持簇的原始分布,提高边界样本的采样率。这有助于提高最终分类性能。同时,与经典的K均值聚类欠采样方法相比,本文提出的MBCDK均值欠采样方法具有较低的时间复杂度。人工神经网络(ANN)广泛用于数据分类,但很容易陷入局部最小值。因此,提出了一种利用遗传算法优化神经网络的权重和偏差的遗传算法人工神经网络(GA-ANN)。 GA-ANN的性能优于ANN。在8个数据集上的实验结果表明了该算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号