首页> 外文期刊>Expert systems with applications >Oversampling method using outlier detectable generative adversarial network
【24h】

Oversampling method using outlier detectable generative adversarial network

机译:超采样方法使用异常可检测生成的对抗网络

获取原文
获取原文并翻译 | 示例

摘要

A class imbalance problem occurs when a particular class of data is significantly more or less than another class of data. This problem is difficult to solve; however, solutions such as the oversampling method using synthetic minority oversampling technique (SMOTE) or conditional generative adversarial network (cGAN) have been suggested recently to solve this problem. In the case of SMOTE and their variations, it is possible to generate biased artificial data because it does not consider the entire data in the minority class. To overcome this problem, an oversampling method using cGAN has been proposed. However, such a method does not consider the majority class that affects the classification boundary. In particular, if there is an outlier in the majority class, the classification boundary may be biased. This paper presents an oversampling method using outlier detectable generative adversarial network (OD-GAN) to solve this problem. We use a discriminator, which is used only for training purposes in cGAN, as an outlier detector to quantify the difference between the distributions of the majority and minority classes. The discriminator can detect and remove outliers. This prevents the distortion of the classification boundary caused by outliers. The generator imitates the distribution of the minority class and generates artificial data to balance the dataset. We experiment with various datasets, oversampling techniques, and classifiers. The empirical results show that the performance of OD-GAN is better than those of other oversampling methods for imbalanced datasets with outliers. (C) 2019 Elsevier Ltd. All rights reserved.
机译:当特定类别的数据显着或多类数据时,发生类别不平衡问题。这个问题很难解决;然而,最近已经提出了使用合成少数群体过采样技术(SMOTE)或有条件生成的对冲网络(CGAN)的过采样方法等解决方案来解决这个问题。在粉碎的情况和它们的变化的情况下,可以产生偏置的人工数据,因为它不考虑少数类中的整个数据。为了克服这个问题,提出了一种使用Cgan的过采样方法。但是,这种方法不考虑影响分类边界的多数类。特别是,如果大多数类中存在异常,则分类边界可能被偏置。本文介绍了使用异常检测生成的对冲网络(OD-GAN)来解决此问题的过采样方法。我们使用仅用于培训目的的鉴别符,作为培训目的,作为异常探测器,以量化多数和少数阶级的分布之间的差异。鉴别器可以检测和删除异常值。这可以防止由异常值引起的分类边界的失真。生成器模仿少数群体类的分布,并生成人工数据以平衡数据集。我们试验各种数据集,过采样技术和分类器。经验结果表明,OD-GaN的性能优于具有异常异常值的不平衡数据集的其他过采样方法。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号