Research on Classification Method of High-Dimensional Class-Imbalanced Data Sets Based on SVM

机译：基于SVM的高维类 - 不平衡数据集分类方法研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, the problem of classification for high dimensional and class-imbalanced data is found in many fields like bioinformatics and so on. High dimensional problem result in bad classification results because of some combinations of features have adverse effect on classification. Class-imbalanced problem means the number of samples of one class is more than another class, which would make the classifier concerns the majority class more but the minority less. The two problems are both exist in high dimensional and class-imbalanced data sets. Many researchers make researches on high dimensional problem and class-imbalanced problem separately and come up with a series of algorithms. They ignored the new problem arising from the mutual influence of class-imbalanced problem and high dimensional problem. This article introduces the two problems and analysis the new problem arising from the influence of the two problems firstly. And then this article introduces SVM, analysis its advantages on dealing high dimensional problem and class-imbalanced problem. Next, this article improves SVM-RFE by considering the class-imbalanced problem in the process of feature selection and improve SMOTE so that the procedure of over-sampling could work in the Hilbert space and the over-sampling rates are set adaptably meanwhile. Finally, a classification algorithm aimed at high dimensional and class-imbalanced data sets is come up in this article which named BRFE-PBKS-SVM: Border-Resampling Feature Elimination and PSO Border-Kernel-SMOTE SVM. And a series of experiments were made to prove the effectiveness of this algorithm by using different evaluation indexes.

机译：近年来，在生物信息学等许多领域中发现了高维和类级数据分类的问题，如生物信息学等。由于一些特征组合对分类产生不利影响，高维问题导致差分类结果不良。类 - 不平衡问题意味着一个类的样本数量超过另一个类，这将使分类器更关注多数课程，而是少数群体。在高维和类别的数据集中都存在两个问题。许多研究人员分别对高维问题和类别的问题进行了研究，并提出了一系列算法。他们忽略了类别不平衡问题的相互影响和高维问题所产生的新问题。本文介绍了两个问题和分析了两个问题的影响。然后本文介绍了SVM，分析了对处理高维问题和类别的问题的优势。接下来，本文通过考虑特征选择过程中的类别不平衡问题来改善SVM-RFE，提高SMOTE，以便在HILBERT空间中可以在HILBERT空间中工作，并且同时设置过采样率。最后，在本文中提出了一种针对高维和类商业数据集的分类算法，该文章名为BRFE-PBKS-SVM：边界重采样功能消除和PSO边框 - 内核-MMOTE SVM。并通过使用不同的评估指标来证明该算法的有效性并进行一系列实验。

著录项

来源
《IEEE International Conference on Data Science in Cyberspace》|2017年|669p|共8页
会议地点
作者
Chunkai Zhang; Jianwei Guo; Junru Lu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Support vector machines; Classification algorithms; Interpolation; Training; Bioinformatics; Algorithm design and analysis; Sampling methods;

机译：支持向量机;分类算法;插值;培训;生物信息学;算法设计和分析;采样方法;

相似文献

外文文献
中文文献
专利

1. Research on classification method of high-dimensional class-imbalanced datasets based on SVM [J] . Zhang Chunkai, Zhou Ying, Guo Jianwei, International journal of machine learning and cybernetics . 2019,第7期

机译：基于支持向量机的高维类不平衡数据集分类方法研究
2. Research on classification method of high-dimensional class-imbalanced datasets based on SVM [J] . Zhang Chunkai, Zhou Ying, Guo Jianwei, International journal of machine learning and cybernetics . 2019,第7期

机译：基于SVM的高维类别 - 不平衡数据集分类方法研究
3. Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification [J] . Maldonado Sebastian, Lopez Julio Applied Soft Computing . 2018,第期

机译：处理高维类别 - 不平衡数据集：SVM分类的嵌入式功能选择
4. Research on Classification Method of High-Dimensional Class-Imbalanced Data Sets Based on SVM [C] . Chunkai Zhang, Jianwei Guo, Junru Lu IEEE International Conference on Data Science in Cyberspace . 2017

机译：基于支持向量机的高维类不平衡数据集分类方法研究
5. Classification of High-dimensional Data Based on Multiple Testing Methods [D] . Ma, Chong. 2018

机译：基于多种测试方法的高维数据分类
6. Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data [O] . Natalia Becker, Grischa Toedt, Peter Lichter, 2011

机译：弹性SCAD作为高维数据中SVM分类任务的一种新型惩罚方法
7. Imbalanced Data Set CSVM Classification Method Based on Cluster Boundary Sampling [O] . Peng Li, Tian-ge Liang, Kai-hui Zhang 2016

机译：基于群集边界采样的Imbalanced数据集CSVM分类方法
8. Novel Texture-based Visualization Methods for High-dimensional Multi- field Data Sets. [R] . B. Wuensche 2013

机译：基于纹理的高维多场数据集可视化方法。

Research on Classification Method of High-Dimensional Class-Imbalanced Data Sets Based on SVM

摘要

著录项

相似文献

相关主题

期刊订阅