...
首页> 外文期刊>Innovations in Systems and Software Engineering >A comparative analysis between two techniques for the prediction of software defects: fuzzy and statistical linear regression
【24h】

A comparative analysis between two techniques for the prediction of software defects: fuzzy and statistical linear regression

机译:模糊预测和统计线性回归两种预测软件缺陷的技术的比较分析

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Software engineers should estimate the necessary resources (time, people, software tools among others) to satisfy software project requirements; this activity is carried out in the planning phase. The estimated time for developing software projects is a necessary element to establish the cost of software projects and to assign human resources to every phase of software projects. Most companies fail to finish software projects on time because of a poor estimation technique or the lack of the same. The estimated time must consider the time spent eliminating software defects injected during each of the software phases. A comparative analysis between two techniques (fuzzy linear regression and statistical linear regression) to perform software defect estimation is presented. These two techniques model uncertainty in a different way; statistical linear regression models uncertainty as randomness, whereas fuzzy linear regression models uncertainty as fuzziness. The main objective of this paper was to establish the kind of uncertainty associated with software defect prediction and to contrast these two prediction techniques. The KC1 NASA data set was used to do this analysis. Only six of the metrics included in KC1 data set and lines of code metric were used in this comparative analysis. Descriptive statistics was first used to have an overview of the main characteristics of the data set used in this research. Linearity property between predictor variables and the variable of interest number of defects was checked using scatter plots and Pearson's correlation coefficient. Then the problem of multicollinearity was verified using inter-correlations among metrics and the variance inflation factor. Best subset regression was applied to detect the most influencing subset of predictor variables; this subset was later used to build fuzzy and statistical regression models. Linearity property between metrics and number of defects was confirmed. The problem of multicollinearity was not detected in the predictor variables. Best subset regression found that the subset composed of 5 variables was the most influencing subset. The analysis showed that the statistical regression model in general outperformed the fuzzy regression model. Techniques for making software defect prediction should be carefully employed in order to have quality plans. Software engineers should consider and understand a set of prediction techniques and know their weaknesses and strengths. At least, in the KC1 data set, the uncertainty in the software defect prediction model is due to randomness so it is reasonable to use statistical linear regression instead of fuzzy linear regression to build a prediction model.
机译:软件工程师应估计必要的资源(时间,人员,软件工具等),以满足软件项目要求;这项活动是在计划阶段进行的。开发软件项目的估计时间是确定软件项目的成本并将人力资源分配到软件项目的每个阶段的必要元素。由于评估技术不佳或缺乏评估技术,大多数公司未能按时完成软件项目。估计时间必须考虑花费在消除每个软件阶段中注入的软件缺陷上的时间。给出了两种用于软件缺陷估计的技术(模糊线性回归和统计线性回归)的比较分析。这两种技术以不同的方式对不确定性建模。统计线性回归将不确定性建模为随机性,而模糊线性回归将不确定性建模为模糊性。本文的主要目的是建立与软件缺陷预测相关的不确定性,并对比这两种预测技术。使用KC1 NASA数据集进行此分析。在此比较分析中,仅使用了KC1数据集中包含的六个度量标准和代码度量标准行。描述性统计首先用于对本研究中使用的数据集的主要特征进行概述。使用散点图和皮尔逊相关系数检查了预测变量和缺陷的关注数变量之间的线性特性。然后利用度量与方差膨胀因子之间的相互关系验证了多重共线性问题。应用最佳子集回归来检测影响力最大的子集。该子集随后用于构建模糊和统计回归模型。度量和缺陷数量之间的线性特性得到确认。在预测变量中未检测到多重共线性问题。最佳子集回归发现,由5个变量组成的子集是影响最大的子集。分析表明,统计回归模型总体上优于模糊回归模型。为了制定质量计划,应认真采用进行软件缺陷预测的技术。软件工程师应考虑并理解一组预测技术,并了解其弱点和优点。至少在KC1数据集中,软件缺陷预测模型的不确定性是由于随机性造成的,因此使用统计线性回归而不是模糊线性回归来构建预测模型是合理的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号