首页> 外文期刊>Expert systems with applications >Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning
【24h】

Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning

机译:条件Wasserstein GAN的表格数据的过采样,用于非平衡学习

获取原文
获取原文并翻译 | 示例

摘要

Class imbalance impedes the predictive performance of classification models. Popular countermeasures include oversampling minority class cases by creating synthetic examples. The paper examines the potential of Generative Adversarial Networks (GANs) for oversampling. A few prior studies have used GANs for this purpose but do not reflect recent methodological advancements for generating tabular data using GANs. The paper proposes an approach based on a conditional Wasserstein GAN that can effectively model tabular datasets with numerical and categorical variables and pays special attention to the down-stream classification task through an auxiliary classifier loss. We focus on a credit scoring context in which binary classifiers predict the default risk of loan applications. Empirical comparisons in this context evidence the competitiveness of GAN-based oversampling compared to several standard oversampling regimes. We also clarify the conditions under which oversampling in general and the proposed GAN-based approach in particular raise predictive performance. In sum, our findings suggest that GAN architectures for tabular data and our extensions deserve a place in data scientists' modelling toolbox.
机译:类别不平衡阻碍了分类模型的预测性能。流行对策包括通过创建合成示例来包括过采样少数阶级案例。本文探讨了用于过采样的生成对抗性网络(GANS)的潜力。几个先前的研究使用了GAN为此目的,但不反映最近使用GAN生成表格数据的方法论进步。本文提出了一种基于条件Wassersein GaN的方法,可以通过数值和分类变量有效地模拟表格数据集,并通过辅助分类器丢失特别注意下游分类任务。我们专注于信用评分环境,其中二进制分类器预测贷款应用的违约风险。在这方面的经验比较证据证明了与几个标准过采样制度相比,GaN的过采样的竞争力。我们还澄清了一般过采样的条件,特别是提出了预测性能的拟议GAN方法。总而言之,我们的研究结果表明,表格数据和我们的扩展的GAN架构应该有一个地方在数据科学家的建模工具箱中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号