首页> 外文会议>International Conference on Future Generation Communication Technologies >HSDD: a hybrid sampling strategy for class imbalance in defect prediction data sets
【24h】

HSDD: a hybrid sampling strategy for class imbalance in defect prediction data sets

机译:HSDD:一种用于缺陷预测数据集中类别不平衡的混合采样策略

获取原文

摘要

Class imbalance is a common problem in defect prediction data sets. In order to cope with this problem, over- sampling and undersampling methods are employed. However, these methods are designed for instance based alteration and not specialized for feature space. Also there is not any distinctive approach to cope with class imbalance in defect prediction data sets. We develop HSDD (hybrid sampling for defect data sets) to solve this problem. HSDD comprises not only derivation of low-level metrics, but also reduction processes of repeated data points. The method was evaluated on industrial and open source project data sets by using Bayes, naive Bayes, random forest, and J48 in terms of g-mean and training time. Obtained results show that HSDD produces promising training performance especially in large-scale data sets.
机译:类不平衡是缺陷预测数据集中的常见问题。为了解决这个问题,采用了过采样和欠采样方法。但是,这些方法是为基于实例的更改而设计的,而不是专门用于特征空间的。同样,也没有任何独特的方法来应对缺陷预测数据集中的类别不平衡。我们开发了HSDD(缺陷数据集的混合采样)来解决此问题。 HSDD不仅包括低级指标的推导,还包括重复数据点的缩减过程。通过使用贝叶斯,朴素贝叶斯,随机森林和J48在g均值和训练时间方面对工业和开源项目数据集进行了方法评估。获得的结果表明,HSDD产生了有希望的训练效果,尤其是在大规模数据集中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号