...
首页> 外文期刊>Information Sciences: An International Journal >Under-sampling class imbalanced datasets by combining clustering analysis and instance selection
【24h】

Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

机译:通过组合群集分析和实例选择,通过在采样类上采样的数据集

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Class-imbalanced datasets, i.e., those with the number of data samples in one class being much larger than that in another class, occur in many real-world problems. Using these datasets, it is very difficult to construct effective classifiers based on the current classification algorithms, especially for distinguishing small or minority classes from the majority class. To solve the class imbalance problem, the under/oversampling techniques have been widely used to reduce and enlarge the numbers of data samples in the majority and minority classes, respectively. Moreover, the combinations of certain sampling approaches with ensemble classifiers have shown reasonably good performance. In this paper, a novel undersampling approach called cluster-based instance selection (CBIS) that combines clustering analysis and instance selection is introduced. The clustering analysis component groups similar data samples of the majority class dataset into 'subclasses', while the instance selection component filters out unrepresentative data samples from each of the 'subclasses'. The experimental results based on the KEEL dataset repository show that the CBIS approach can make bagging and boosting-based MLP ensemble classifiers perform significantly better than six state-of-the-art approaches, regardless of what kinds of clustering (affinity propagation and k-means) and instance selection (IB3, DROP3 and GA) algorithms are used. (C) 2018 Elsevier Inc. All rights reserved.
机译:类别 - 不平衡数据集,即一个类中的数据样本数量比另一个类更大的数据集那样,发生在许多真实问题中。使用这些数据集,非常困难地构建基于当前分类算法的有效分类器,特别是用于区分来自大多数类的小型或少数阶级。为了解决类别不平衡问题,人数/过采样技术已被广泛用于减少和扩大多数和少数群体类别中的数据样本的数量。此外,某些采样方法的组合与集合分类器的性能相当好。在本文中,引入了一种名为基于群集的实例选择(CBI)的新颖的underAppling方法,该方法结合了聚类分析和实例选择。群集分析组件将多数类数据集的类似数据样本到“子类”,而实例选择组件会从每个“子类”中筛选出从每个“子类”中的未代表性数据样本。基于Keel数据集存储库的实验结果表明,CBIS方法可以使基于装袋和基于促进的MLP集合分类器显着优于六种最先进的方法,无论什么样的聚类(亲和力传播和k-使用方法)和实例选择(IB3,DAMP3和GA)算法。 (c)2018年Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号