首页> 外文期刊>Expert Systems with Application >Real-value negative selection over-sampling for imbalanced data set learning
【24h】

Real-value negative selection over-sampling for imbalanced data set learning

机译:用于不平衡数据集学习的实值否定选择过采样

获取原文
获取原文并翻译 | 示例

摘要

The learning problem from imbalanced data set poses a major challenge in data mining community. Conventional machine learning algorithms show poor performance in dealing with the classification problems of imbalanced data set since they are originally designed to work with balanced class distribution. In this paper, we propose a new over-sampling technique, which uses the real-value negative selection (RNS) procedure to generate artificial minority data with no requirement of actual minority data available. The generated minority data with rare actual minority data if available are combined with the majority data as input to a bi-class classification approach for learning. In the experiments, we demonstrate the effectiveness of RNS in avoiding the problems often encountered by the existing over-sampling methods such as the generation of noisy instances and almost duplicated instances in the same clusters. Moreover, the extensive experimental results on the different imbalanced datasets from UCI repository and real-world imbalanced datasets show that when dealing with the classification of imbalanced datasets, the proposed hybrid approach can achieve better performance in terms of both G-Mean and F-Measure evaluation metrics as compared to the other existing imbalanced dataset classification techniques. (C) 2019 Elsevier Ltd. All rights reserved.
机译:来自不平衡数据集的学习问题对数据挖掘社区提出了重大挑战。常规的机器学习算法在处理不平衡数据集的分类问题时表现出较差的性能,因为它们最初是设计用于平衡类分布的。在本文中,我们提出了一种新的过采样技术,该技术使用实值否定选择(RNS)过程来生成人工少数派数据,而不需要实际的少数派数据。将生成的具有少数实际少数派数据(如果有)的少数派数据与多数派数据组合,作为双类分类方法的输入,以供学习。在实验中,我们证明了RNS可以避免现有过采样方法经常遇到的问题,例如在同一群集中产生嘈杂实例和几乎重复的实例。此外,针对UCI资料库和现实世界中不平衡数据集的不同不平衡数据集的广泛实验结果表明,在处理不平衡数据集的分类时,所提出的混合方法可以在G均值和F-Measure方面实现更好的性能。与其他现有的不平衡数据集分类技术相比的评估指标。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert Systems with Application》 |2019年第9期|118-134|共17页
  • 作者单位

    Northeast Forestry Univ, Coll Engn & Technol, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China;

    Northeast Forestry Univ, Coll Engn & Technol, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China;

    Northeast Forestry Univ, Coll Engn & Technol, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China;

    Northeast Forestry Univ, Coll Engn & Technol, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China;

    Northeast Forestry Univ, Coll Engn & Technol, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China;

    Northeast Forestry Univ, Coll Engn & Technol, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China;

    Northeast Forestry Univ, Coll Engn & Technol, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China;

    Northeast Forestry Univ, Coll Engn & Technol, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Imbalanced data set; Over-sampling technique; Real-value negative selection; Under-sampling;

    机译:数据集不均衡;过采样技术;实值负选择;欠采样;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号