首页> 外文会议>IEEE International Conference on Data Science in Cyberspace >Research on Classification Method of High-Dimensional Class-Imbalanced Data Sets Based on SVM
【24h】

Research on Classification Method of High-Dimensional Class-Imbalanced Data Sets Based on SVM

机译:基于SVM的高维类 - 不平衡数据集分类方法研究

获取原文

摘要

In recent years, the problem of classification for high dimensional and class-imbalanced data is found in many fields like bioinformatics and so on. High dimensional problem result in bad classification results because of some combinations of features have adverse effect on classification. Class-imbalanced problem means the number of samples of one class is more than another class, which would make the classifier concerns the majority class more but the minority less. The two problems are both exist in high dimensional and class-imbalanced data sets. Many researchers make researches on high dimensional problem and class-imbalanced problem separately and come up with a series of algorithms. They ignored the new problem arising from the mutual influence of class-imbalanced problem and high dimensional problem. This article introduces the two problems and analysis the new problem arising from the influence of the two problems firstly. And then this article introduces SVM, analysis its advantages on dealing high dimensional problem and class-imbalanced problem. Next, this article improves SVM-RFE by considering the class-imbalanced problem in the process of feature selection and improve SMOTE so that the procedure of over-sampling could work in the Hilbert space and the over-sampling rates are set adaptably meanwhile. Finally, a classification algorithm aimed at high dimensional and class-imbalanced data sets is come up in this article which named BRFE-PBKS-SVM: Border-Resampling Feature Elimination and PSO Border-Kernel-SMOTE SVM. And a series of experiments were made to prove the effectiveness of this algorithm by using different evaluation indexes.
机译:近年来,在生物信息学等许多领域中发现了高维和类级数据分类的问题,如生物信息学等。由于一些特征组合对分类产生不利影响,高维问题导致差分类结果不良。类 - 不平衡问题意味着一个类的样本数量超过另一个类,这将使分类器更关注多数课程,而是少数群体。在高维和类别的数据集中都存在两个问题。许多研究人员分别对高维问题和类别的问题进行了研究,并提出了一系列算法。他们忽略了类别不平衡问题的相互影响和高维问题所产生的新问题。本文介绍了两个问题和分析了两个问题的影响。然后本文介绍了SVM,分析了对处理高维问题和类别的问题的优势。接下来,本文通过考虑特征选择过程中的类别不平衡问题来改善SVM-RFE,提高SMOTE,以便在HILBERT空间中可以在HILBERT空间中工作,并且同时设置过采样率。最后,在本文中提出了一种针对高维和类商业数据集的分类算法,该文章名为BRFE-PBKS-SVM:边界重采样功能消除和PSO边框 - 内核-MMOTE SVM。并通过使用不同的评估指标来证明该算法的有效性并进行一系列实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号