Oversampling the Minority Class in the Feature Space

María Pérez-Ortiz; Pedro Antonio Gutiérrez; Peter Tino; César Hervás-Martínez

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Oversampling the Minority Class in the Feature Space

【24h】

Oversampling the Minority Class in the Feature Space

机译：对特征空间中的少数群体进行过采样

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach oversamples the minority class through convex combination of its patterns. We explore the general idea of synthetic oversampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (EFS) (a Euclidean space isomorphic to the feature space) for oversampling purposes. The proposed method is framed in the context of support vector machines, where the imbalanced data sets can pose a serious hindrance. The idea is investigated in three scenarios: 1) oversampling in the full and reduced-rank EFSs; 2) a kernel learning technique maximizing the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); and 3) a unified framework for preferential oversampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over 50 imbalanced data sets.

机译：某些现实世界数据的不平衡特性是机器学习研究人员当前面临的挑战之一。一种常见的方法是通过凸组合少数群体的模式对少数群体进行过采样。我们探索由核函数（相对于输入空间）引起的特征空间中合成过采样的一般思想。如果内核函数匹配基本问题，则这些类将是线性可分离的，并且合成生成的模式将位于少数类区域。由于不能直接访问特征空间，因此我们将经验特征空间（EFS）（特征空间的同构欧氏空间）用于过采样。所提出的方法是在支持向量机的框架内构建的，其中不平衡的数据集会带来严重的障碍。该想法在三种情况下进行了研究：1）在完整和降级的EFS中进行过采样； 2）最大化数据类分离的内核学习技术，以研究特征空间结构的影响（由内核函数隐式定义）；（3）优先过采样的统一框架，涵盖了文献中的某些先前方法。我们通过50多个不平衡数据集的广泛实验来支持我们的调查。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2016年第9期|1947-1961|共15页
作者
María Pérez-Ortiz; Pedro Antonio Gutiérrez; Peter Tino; César Hervás-Martínez;
展开▼
作者单位

Department of Mathematics and Engineering, Loyola University Andalucía, Spain;

Department of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain;

School of Computer Science, University of Birmingham, Birmingham, U.K.;

Department of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Kernel; Support vector machines; Training; Symmetric matrices; Learning systems; Eigenvalues and eigenfunctions; Algorithm design and analysis;

机译：内核;支持向量机;训练;对称矩阵;学习系统;特征值和特征函数;算法设计与分析;

相似文献

外文文献
中文文献
专利

1. Adaptive learning of minority class prior to minority oversampling [J] . Payel Sadhukhan, Sarbani Palit Pattern recognition letters . 2020,第Auga期

机译：少数民族过采样前的少数阶级的自适应学习
2. SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique [J] . Li Yihong, Wang Yunpeng, Li Tao, Knowledge-Based Systems . 2021,第Sepa27期

机译：SP-SMOTE：基于新的空间分区合成少数群体过采样技术
3. Local distribution-based adaptive minority oversampling for imbalanced data classification [J] . Wang Xinyue, Xu Jian, Zeng Tieyong, Neurocomputing . 2021,第Jana21期

机译：基于地方分布的自适应少数群体过采样，用于不平衡数据分类
4. Optimization of Phishing Website Classification Based on Synthetic Minority Oversampling Technique and Feature Selection [C] . Rizal Dwi Prayogo, Siti Amatullah Karimah International Workshop on Big Data and Information Security . 2020

机译：基于综合少数族群过采样技术和特征选择的钓鱼网站分类优化
5. Data stream classification techniques for multiple novel classes and dynamic feature spaces. [D] . Chen, QIng. 2010

机译：用于多个新颖类和动态特征空间的数据流分类技术。
6. A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique [O] . Runtao Yang, Chengjin Zhang, Lina Zhang, -1

机译：基于多视角特征和少数族群过采样技术的两步特征预测方法
7. A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique [O] . Runtao Yang, Chengjin Zhang, Lina Zhang, 2018

机译：通过多视图特征和合成少数群体过采样技术预测癌症的两步特征选择方法
8. Oversampling PCM Techniques and Optimum Noise Shapers for Quantizing a Class ofNonbandlimited Signals [R] . Vaidyanathan, P. P., Tuqan, J. 1996

机译：过采样pCm技术和最佳噪声整形器，用于量化一类非带限信号

Oversampling the Minority Class in the Feature Space

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅