Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

Tsai Chih-Fong; Lin Wei-Chao; Hu Ya-Han; Yao Guan-Ting

首页> 外文期刊>Information Sciences: An International Journal >Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

【24h】

Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

机译：通过组合群集分析和实例选择，通过在采样类上采样的数据集

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Class-imbalanced datasets, i.e., those with the number of data samples in one class being much larger than that in another class, occur in many real-world problems. Using these datasets, it is very difficult to construct effective classifiers based on the current classification algorithms, especially for distinguishing small or minority classes from the majority class. To solve the class imbalance problem, the under/oversampling techniques have been widely used to reduce and enlarge the numbers of data samples in the majority and minority classes, respectively. Moreover, the combinations of certain sampling approaches with ensemble classifiers have shown reasonably good performance. In this paper, a novel undersampling approach called cluster-based instance selection (CBIS) that combines clustering analysis and instance selection is introduced. The clustering analysis component groups similar data samples of the majority class dataset into 'subclasses', while the instance selection component filters out unrepresentative data samples from each of the 'subclasses'. The experimental results based on the KEEL dataset repository show that the CBIS approach can make bagging and boosting-based MLP ensemble classifiers perform significantly better than six state-of-the-art approaches, regardless of what kinds of clustering (affinity propagation and k-means) and instance selection (IB3, DROP3 and GA) algorithms are used. (C) 2018 Elsevier Inc. All rights reserved.

机译：类别 - 不平衡数据集，即一个类中的数据样本数量比另一个类更大的数据集那样，发生在许多真实问题中。使用这些数据集，非常困难地构建基于当前分类算法的有效分类器，特别是用于区分来自大多数类的小型或少数阶级。为了解决类别不平衡问题，人数/过采样技术已被广泛用于减少和扩大多数和少数群体类别中的数据样本的数量。此外，某些采样方法的组合与集合分类器的性能相当好。在本文中，引入了一种名为基于群集的实例选择（CBI）的新颖的underAppling方法，该方法结合了聚类分析和实例选择。群集分析组件将多数类数据集的类似数据样本到“子类”，而实例选择组件会从每个“子类”中筛选出从每个“子类”中的未代表性数据样本。基于Keel数据集存储库的实验结果表明，CBIS方法可以使基于装袋和基于促进的MLP集合分类器显着优于六种最先进的方法，无论什么样的聚类（亲和力传播和k-使用方法）和实例选择（IB3，DAMP3和GA）算法。（c）2018年Elsevier Inc.保留所有权利。

著录项

来源
《Information Sciences: An International Journal》 |2019年第2019期|共8页
作者
Tsai Chih-Fong; Lin Wei-Chao; Hu Ya-Han; Yao Guan-Ting;
展开▼
作者单位

Natl Cent Univ Dept Informat Management Zhongli Taiwan;

Chang Gung Univ Dept Informat Management Taoyuan Taiwan;

Natl Chung Cheng Univ Dept Informat Management Chiayi Taiwan;

Natl Cent Univ Dept Informat Management Zhongli Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;计算机的应用;信息与知识传播;自动化技术、计算机技术;
关键词
Data mining; Class imbalance; Clustering; Ensemble classifiers; Instance selection;

机译：数据挖掘;类不平衡;群集;集合;集成分类器;实例选择;

相似文献

外文文献
中文文献
专利

1. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection [J] . Tsai Chih-Fong, Lin Wei-Chao, Hu Ya-Han, Information Sciences: An International Journal . 2019,第期

机译：通过组合群集分析和实例选择，通过在采样类上采样的数据集
2. A Noisy-sample-removed Under-sampling Scheme for Imbalanced Classification of Public Datasets [J] . Honghao Zhu, Guanjun Liu, Mengchu Zhou, IFAC PapersOnLine . 2020,第5期

机译：用于公共数据集的不平衡分类的嘈杂 - 样本删除的下采样方案
3. Critical Instances Removal based Under-Sampling (CIRUS): A solution for class imbalance problem [J] . Rekha Gillala, Reddy V. Krishna, Tyagi Amit Kumar International Journal of Hybrid Intelligent Systems . 2020,第2期

机译：基于关键的实例去除基于抽样（CIRU）：类别不平衡问题的解决方案
4. Feature selection and Ensemble Hierarchical Cluster-based Under-sampling approach for extremely imbalanced datasets: Application to gene classification [C] . Soltani Sima, Sadri Javad, Torshizi Hassan Ahmadi International eConference on Computer and Knowledge Engineering;ICCKE . 2011

机译：极不平衡数据集的特征选择和基于集合层次聚类的欠采样方法：在基因分类中的应用
5. Classifier design to improve pattern classification and knowledge discovery for imbalanced datasets. [D] . Wang, Kun. 2009

机译：分类器设计可改进模式分类和不平衡数据集的知识发现。
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset [O] . Show-jane Yen, Yue-shi Lee 2008

机译：欠采样方法，用于改善不平衡数据集中少数群体的预测

Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

摘要

著录项

相似文献

相关主题

期刊订阅