首页> 外文学位 >A Segmentation and Re-balancing Approach for Classification of Imbalanced Data.
【24h】

A Segmentation and Re-balancing Approach for Classification of Imbalanced Data.

机译:一种用于不平衡数据分类的细分和重新平衡方法。

获取原文
获取原文并翻译 | 示例

摘要

Classification is one of the important tasks of data mining. Class imbalance -- or differences in class distribution -- has been reported to hinder the performance of standard classification models. This dissertation first presents a systematic study to evaluate the impact of class imbalance on several critical steps of learning, namely feature selection, model fitting and performance evaluation. However, study also shows that class imbalance may not be the only cause to blame for the loss of performance, and the underlying complexity of the problem may play a more fundamental role. In this dissertation, K-S tree, a decision tree method based on Kolmogorov-Smirnov statistic, is proposed to segment the data so that the complex problem can be dissected into easier sub-problems and for each sub-problem class imbalance becomes less challenging. K-S tree is also used to perform feature selection, which not only selects relevant variables but also removes redundant ones. After segmentation, a two-way re-sampling will be performed at segment level and the rebalanced data will be used to fit logistic regression models also at segment level. The effectiveness of the proposed method is demonstrated through three case studies -- automatic detection of microcalcification in Mammogram, San Diego housing refinance prediction and credit risk assessment.
机译:分类是数据挖掘的重要任务之一。据报导,班级失衡或班级分布的差异会阻碍标准分类模型的性能。本文首先提出了一项系统的研究,以评估班级失衡对学习的几个关键步骤的影响,即特征选择,模型拟合和性能评估。但是,研究还表明,阶级失衡可能不是造成绩效下降的唯一原因,而问题的潜在复杂性可能起着更为根本的作用。本文提出了一种基于Kolmogorov-Smirnov统计量的决策树方法K-S树,对数据进行分割,从而将复杂的问题分解为更容易解决的子问题,并且对于每个子问题类别失衡的挑战也越来越小。 K-S树还用于执行特征选择,不仅选择相关变量,还删除了多余的变量。进行细分后,将在细分市场级别执行双向重新采样,并且重新均衡的数据也将用于细分市场级别的逻辑回归模型。通过三个案例研究证明了该方法的有效性-乳房X线照片中微钙化的自动检测,圣地亚哥住房融资预测和信用风险评估。

著录项

  • 作者

    Gong, Rongsheng.;

  • 作者单位

    University of Cincinnati.;

  • 授予单位 University of Cincinnati.;
  • 学科 Engineering Industrial.;Operations Research.;Engineering System Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 116 p.
  • 总页数 116
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号