首页> 外文会议>Pacific-Asia conference on advances in knowledge discovery and data mining;PAKDD 2012 >A Pruning-Based Approach for Searching Precise and Generalized Region for Synthetic Minority Over-Sampling

【24h】

A Pruning-Based Approach for Searching Precise and Generalized Region for Synthetic Minority Over-Sampling

机译：基于修剪的精确少数族群过采样精确区域搜索方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One solution to deal with class imbalance is to modify its class distribution. Synthetic over-sampling is a well-known method to modify class distribution by generating new synthetic minority data. Synthetic Minority Over-sampling TEchnique (SMOTE) is a state-of-the-art synthetic over-sampling algorithm that generates new synthetic data along the line between the minority data and their selected nearest neighbors. Advantages of SMOTE is to have decision regions larger and less specific to original data. However, its drawback is the over-generalization problem where synthetic data is generated into majority class region. Over-generalization leads to misclassify non-minority class region into minority class. To overcome the over-generalization problem, we propose an algorithm, called TRIM, to search for precise minority region while maintaining its generalization. TRIM iteratively filters out irrelevant majority data from the precise minority region. Output of the algorithm is the multiple set of seed minority data, and each individual set will be used for generating new synthetic data. Compared with state-of-the-art over-sampling algorithms, experimental results show significant performance improvement in terms of F-measure and AUC. This suggests over-generalization has a significant impact on the performance of the synthetic over-sampling method.

机译：解决类不平衡的一种解决方案是修改其类分布。合成过采样是一种众所周知的方法，可以通过生成新的合成少数数据来修改类的分布。综合少数族裔过采样技术（SMOTE）是一种先进的综合性过采样算法，可沿少数族裔数据与其选定的最近邻居之间的直线生成新的综合数据。 SMOTE的优点是可以使决策区域更大，而对原始数据的针对性则较小。但是，它的缺点是过度概括的问题，其中合成数据生成到多数类区域中。过度概括会导致将非少数族裔地区错误地分类为少数族裔阶层。为了克服过度概括的问题，我们提出了一种称为TRIM的算法，可以在保持其普遍性的同时搜索精确的少数区域。 TRIM迭代地从精确的少数区域中过滤掉无关的多数数据。该算法的输出是种子少数数据的多个集合，并且每个单独的集合将用于生成新的合成数据。与最新的过采样算法相比，实验结果表明，在F度量和AUC方面，性能有了显着提高。这表明过度概括对合成过度采样方法的性能有重大影响。

著录项

来源
《Pacific-Asia conference on advances in knowledge discovery and data mining;PAKDD 2012 》|2012年|p.371-382|共12页
会议地点
作者
Kamthorn Puntumapon; Kitsana Waiyamai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique [J] . Shikai Guo, Jian Dong, Hui Li, Journal of software: evolution and process . 2021 ,第7期

机译：半径合成少数群体过采样技术对软件缺陷预测
2. The selection of wart treatment method based on Synthetic Minority Over-sampling Technique and Axiomatic Fuzzy Set theory [J] . Biocybernetics and biomedical engineering . 2020 ,第1期

机译：基于合成少数群体过度采样技术的WART处理方法的选择和公理模糊集理论
3. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data [J] . Liu Xu-Ying, Wang Sheng-Tao, Zhang Min-Ling Frontiers of computer science in China . 2019 ,第5期

机译：传输合成过采样，以较少的少数班级数据进行班级不平衡学习
4. An Improved Intrusion Detection Approach using Synthetic Minority Over-Sampling Technique and Deep Belief Network [C] . S. Hasan ADIL, S. Saad Azhar ALI, Kamran RAZA, International conference on intelligent software methodologies, tools, and techniques . 2014

机译：综合少数族群过采样技术和深度信任网络的改进入侵检测方法
5. A synthetic approach to forecasting fleet costs and earnings in the northeast regions of the United States. [D] . Lallemand, Philippe D. 2001

机译：一种用于预测美国东北地区机队成本和收益的综合方法。
6. Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment [O] . Zhen-Tao Liu, Bao-Han Wu, Dan-Yun Li, 2020

机译：小样本环境下基于选择性插值综合少数采样技术的语音情感识别
7. Synthetic Minority Over-Sampling for Improving Imbalanced Data in Educational Web Usage Mining [O] . Wacharawan Intayoad, Chayapol Kamyod, Punnarumol Temdee 2019

机译：合成少数群体过度采样，用于改善教育网上使用挖掘中的不平衡数据

A Pruning-Based Approach for Searching Precise and Generalized Region for Synthetic Minority Over-Sampling

摘要

著录项

相似文献

相关主题

期刊订阅