HSDD: a hybrid sampling strategy for class imbalance in defect prediction data sets

机译：HSDD：一种用于缺陷预测数据集中类别不平衡的混合采样策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Class imbalance is a common problem in defect prediction data sets. In order to cope with this problem, over- sampling and undersampling methods are employed. However, these methods are designed for instance based alteration and not specialized for feature space. Also there is not any distinctive approach to cope with class imbalance in defect prediction data sets. We develop HSDD (hybrid sampling for defect data sets) to solve this problem. HSDD comprises not only derivation of low-level metrics, but also reduction processes of repeated data points. The method was evaluated on industrial and open source project data sets by using Bayes, naive Bayes, random forest, and J48 in terms of g-mean and training time. Obtained results show that HSDD produces promising training performance especially in large-scale data sets.

机译：类不平衡是缺陷预测数据集中的常见问题。为了解决这个问题，采用了过采样和欠采样方法。但是，这些方法是为基于实例的更改而设计的，而不是专门用于特征空间的。同样，也没有任何独特的方法来应对缺陷预测数据集中的类别不平衡。我们开发了HSDD（缺陷数据集的混合采样）来解决此问题。 HSDD不仅包括低级指标的推导，还包括重复数据点的缩减过程。通过使用贝叶斯，朴素贝叶斯，随机森林和J48在g均值和训练时间方面对工业和开源项目数据集进行了方法评估。获得的结果表明，HSDD产生了有希望的训练效果，尤其是在大规模数据集中。

著录项

来源
《International Conference on Future Generation Communication Technologies》|2016年|60-69|共10页
会议地点
作者
M. Maruf Ozturk; Ahmet Zengin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Prediction algorithms; Training; Noise measurement; Software; Support vector machines; Computers;

机译：预测算法;训练;噪声测量;软件;支持向量机;计算机;

相似文献

外文文献
中文文献
专利

1. Sampling imbalance dataset for software defect prediction using hybrid neuro-fuzzy systems with Naive Bayes classifier [J] . Punitha K., Latha B. Technical Gazette . 2016,第6期

机译：使用带有朴素贝叶斯分类器的混合神经模糊系统进行软件缺陷预测的采样不平衡数据集
2. Cross-project defect prediction using data sampling for class imbalance learning: an empirical study [J] . Goel Lipika, Sharma Mayank, Khatri Sunil Kumar, International Journal of Parallel, Emergent and Distributed Systems . 2021,第1a2期

机译：使用类别不平衡学习数据采样的跨项目缺陷预测：实证研究
3. Prediction of secondary testosterone deficiency using machine learning: A comparative analysis of ensemble and base classifiers, probability calibration, and sampling strategies in a slightly imbalanced dataset [J] . Monique Tonani Novaes, Osmar Luiz Ferreira de Carvalho, Pedro Henrique Guimar?es Ferreira, Informatics in Medicine Unlocked . 2021,第a期

机译：使用机器学习预测次级睾酮缺乏：略微不平衡数据集中的集合和基础分类器，概率校准和采样策略的比较分析
4. HSDD: A hybrid sampling strategy for class imbalance in defect prediction data sets [C] . M. Maruf Özturk, Ahmet Zengin International conference on digital information management . 2016

机译：HSDD：一种用于缺陷预测数据集中类别不平衡的混合采样策略
5. Combating the class imbalance problem in small sample data sets. [D] . Wasikowski, Michael. 2009

机译：在小样本数据集中解决类不平衡问题。
6. Prediction Is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models Based on Imbalanced Chemical Data Sets [O] . Priyanka Banerjee, Frederic O. Dehnbostel, Robert Preissner 2018

机译：预测是一种平衡行为：基于不平衡化学数据集的采样方法对平衡预测模型的敏感性和特异性的重要性
7. Sampling imbalance dataset for software defect prediction using hybrid neuro-fuzzy systems with Naive Bayes classifier [O] . 2016

机译：使用朴实贝叶斯分类器的混合神经模糊系统进行软件缺陷预测的采样不平衡数据集

HSDD: a hybrid sampling strategy for class imbalance in defect prediction data sets

摘要

著录项

相似文献

相关主题

期刊订阅