Software Fault Proneness Prediction with Group Lasso Regression: On Factors that Affect Classification Performance

机译：基于组套索回归的软件故障倾向性预测：影响分类性能的因素

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning algorithms have been used extensively for software fault proneness prediction. This paper presents the first application of Group Lasso Regression (G-Lasso) for software fault proneness classification and compares its performance to six widely used machine learning algorithms. Furthermore, we explore the effects of two factors on the prediction performance: the effect of imbalance treatment using the Synthetic Minority Over-sampling Technique (SMOTE), and the effect of datasets used in building the prediction models. Our experimental results are based on 22 datasets extracted from open source projects. The main findings include: (1) G-Lasso is robust to imbalanced data and significantly outperforms the other machine learning algorithms with respect to the Recall and G-Score, i.e., the harmonic mean of Recall and (1- False Positive Rate). (2) Even though SMOTE improved the performance of all learners, it did not have statistically significant effect on G-Lasso's Recall and G-Score. Random Forest was in the top performing group of learners for all performance metrics, while Naive Bayes performed the worst of all learners. (3) When using the same change metrics as features, the choice of the dataset had no effect on the performance of most learners, including G-Lasso. Naive Bayes was the most affected, especially when balanced datasets were used.

机译：机器学习算法已广泛用于软件故障倾向性预测。本文介绍了组Lasso回归（G-Lasso）在软件故障倾向性分类中的首次应用，并将其性能与六种广泛使用的机器学习算法进行了比较。此外，我们探索了两个因素对预测性能的影响：使用综合少数族裔过采样技术（SMOTE）进行的不平衡处理的影响，以及用于构建预测模型的数据集的影响。我们的实验结果基于从开源项目中提取的22个数据集。主要发现包括：（1）G-Lasso对不平衡数据具有鲁棒性，并且在Recall和G-Score方面明显优于其他机器学习算法，即Recall和（1- False Positive Rate）的谐波均值。（2）尽管SMOTE改善了所有学习者的表现，但对G-Lasso的Recall和G-Score没有统计学上的显着影响。在所有绩效指标中，Random Forest均是表现最佳的学习者群体，而Naive Bayes的表现则是所有学习者中最差的。（3）当使用与要素相同的变化指标时，数据集的选择对包括G-Lasso在内的大多数学习者的表现没有影响。朴素贝叶斯受到的影响最大，尤其是在使用平衡数据集的情况下。

著录项

来源
《IEEE Annual Computer Software and Applications Conference》|2019年|336-343|共8页
会议地点
作者
Katerina Goseva-Popstojanova; Mohammad Ahmad; Yasser Alshehri;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Software; Measurement; Machine learning algorithms; Prediction algorithms; Radio frequency; Software algorithms; Predictive models;

机译：软件;测量;机器学习算法;预测算法;射频;软件算法;预测模型;

相似文献

外文文献
中文文献
专利

1. Identification of latent variables using, factor analysis and multiple linear regression for software fault prediction [J] . Deepak Sharma, Pravin Chandra International journal of systems assurance engineering and management . 2019,第6期

机译：使用因素分析和多元线性回归识别潜在变量，以进行软件故障预测
2. Dynamic Fault Prediction of Power Transformers Based on Lasso Regression and Change Point Detection by Dissolved Gas Analysis [J] . Jun Jiang, Ruyi Chen, Chaohai Zhang, Dielectrics and Electrical Insulation, IEEE Transactions on . 2020,第6期

机译：基于套索回归的电力变压器动态故障预测及溶解气体分析改变点检测
3. Retrospective Study on the Influencing Factors and Prediction of Hospitalization Expenses for Chronic Renal Failure in China Based on Random Forest and LASSO Regression [J] . Pingping Dai, Weifu Chang, Zirui Xin, Frontiers in Public Health . 2021,第a期

机译：基于随机森林和套索回归的中国慢性肾功能衰竭治疗费用影响因素及预测研究
4. Software Fault Proneness Prediction with Group Lasso Regression: On Factors that Affect Classification Performance [C] . Katerina Goseva-Popstojanova, Mohammad Ahmad, Yasser Alshehri IEEE Annual Computer Software and Applications Conference . 2019

机译：卢赛索回归组软件故障展向预测：关于影响分类性能的因素
5. Applying Social Network Analysis to Software Fault-Proneness Prediction [D] . Li, Yihao. 2017

机译：社交网络分析在软件故障率预测中的应用
6. Survival prediction in mesothelioma using a scalable Lasso regression model: instructions for use and initial performance using clinical predictors [O] . Andrew C Kidd, Michael McGettrick, Selina Tsim, 2018

机译：使用可扩展的套索回归模型在间皮瘤中进行生存预测：使用临床预测指标的使用说明和初步表现
7. Software Metrics Reduction for Fault-Proneness Prediction of Software Modules [O] . Yunfeng Luo, Kerong Ben, Lei Mi 2010

机译：软件度量标准降低软件模块的故障形态预测

Software Fault Proneness Prediction with Group Lasso Regression: On Factors that Affect Classification Performance

摘要

著录项

相似文献

相关主题

期刊订阅