在分析了传统支持向量机(SVM)对不平衡数据的学习缺陷后,提出了一种改进SVM算法,采用自适应合成(ADASYN)采样技术对数据集进行部分重采样,增加少类样本的数量;对不同的样本点分配不同的权重,减弱噪声对训练结果的影响;使用基于代价敏感的SVM算法训练,缓解不平衡数据对超平面造成的偏移.选择UCI数据库中的6组不平衡数据集进行测试,实验结果表明:在各个数据集上改进SVM算法的性能优于其他算法,并在少类准确率和多类准确率上取得了很好的平衡.%An improved support vector machine(SVM)algorithm is proposed,after analyzing the deficiency of traditional SVM algorithm for imbalanced datasets.It uses adaptive synthetic(ADASYN)sampling technology for partially resampling on dataset,to increase minority class instances;distribute different weights for different sample point to decrease the influence of noise on training result,cost-sensitive SVM algorithm training is adopted to relieve the bias of hyperplane caused by imbalanced datasets. The proposed algorithm is tested on 6 sets of imbalanced datasets from UCI database. The experimental result shows that the performance of improved SVM algorithm is better than other algorithms and achieve a good balance between minority class accuracy and majority class accuracy.
展开▼