首页> 外文学位 >Improving predictive models of software quality using search-based metric selection and decision trees.
【24h】

Improving predictive models of software quality using search-based metric selection and decision trees.

机译:使用基于搜索的指标选择和决策树来改进软件质量的预测模型。

获取原文
获取原文并翻译 | 示例

摘要

Software engineering is a human centric endeavour where the majority of the effort is spent understanding and modifying source code. The ability to automatically identify potentially problematic components would assist developers and project managers to make the best use of limited resources when taking mitigating actions such as detailed code inspections, more exhaustive testing, refactoring or reassignment to more experienced developers. Predictive models can be used to discover poor quality components via structural information from the design and/or source code.;In machine learning, large dimensional feature spaces may contain inputs that are irrelevant or redundant. Feature selection is the process of identifying a subset of features that improve a classifier's discriminatory performance. In analysis of software system, the features used are source code metrics. In this work, an analysis tool has been developed that implements a parallel genetic algorithm (GA) as a search-based metric selection strategy. A comparative study has been carried out between GA, the Chidamber and Kemerer metrics suite (for an objected-oriented dataset), and principal component analysis (PCA) as metric selection strategies with different datasets.;Program comprehension is important for programmers and the first dataset evaluated uses source code inspections as a subjective measure of cognitively complexity that degrade program understanding. Predicting the likely location of system failures is important in order to improve a system's reliability. The second dataset uses an objective measure of faults found in system modules in order to predict fault-prone components.;The aim of this research has been to advance the current state of the art in predictive models of software quality by exploring the efficacy of a search-based approach in selecting appropriate metrics subsets for various predictive objectives. Results show that a search-based strategy, such as GA, performs well as a metric selection strategy when used with a linear discriminant analysis classifier. When predicting cognitive complex classes, GA achieved an F-value of 0.845 compared to an F-value of 0.740 using principal component analysis, and 0.750 when using only the CK metrics suite.;There exist many traditional source code metrics to capture the size, algorithmic complexity, cohesion and coupling of modules. Object-oriented systems have introduced additional structural concepts such as encapsulation and inheritance, providing even more ways to capture and measure different aspects of coupling, cohesion, complexity and size. An important question to answer is: Which metrics should be used with a model for a particular predictive objective?;By examining the GA chosen metrics with a white box predictive model (decision tree classifier) additional insights into the structural properties of a system that degrade product quality were observed. Source code metrics have been designed for human understanding and program comprehension and predictive models for cognitive complexity perform well with just source code metrics. Models for fault prone modules do not perform as well when using only source code metrics and need additional non-source code information, such module modification history or testing history.
机译:软件工程是一项以人为中心的工作,其中大部分工作都花在了理解和修改源代码上。自动识别潜在有问题的组件的能力将有助于开发人员和项目经理在采取缓解措施(例如详细的代码检查,更详尽的测试,重构或重新分配给更有经验的开发人员)时充分利用有限的资源。预测模型可用于通过设计和/或源代码中的结构信息发现质量较差的组件。在机器学习中,大尺寸特征空间可能包含不相关或多余的输入。特征选择是识别可改进分类器区分性能的特征子集的过程。在软件系统分析中,使用的功能是源代码指标。在这项工作中,已经开发了一种分析工具,该工具将并行遗传算法(GA)实现为基于搜索的度量选择策略。在GA,Chidamber和Kemerer指标套件(针对面向对象的数据集)以及主成分分析(PCA)作为具有不同数据集的指标选择策略之间进行了比较研究;程序理解对于程序员和第一批研究者来说很重要评估的数据集使用源代码检查作为认知复杂度的主观衡量指标,从而降低了程序的理解度。为了提高系统的可靠性,预测系统故障的可能位置很重要。第二个数据集使用对系统模块中发现的故障的客观度量来预测容易发生故障的组件。这项研究的目的是通过探索软件的有效性来提高软件质量预测模型的最新水平。基于搜索的方法,为各种预测目标选择适当的指标子集。结果表明,与线性判别分析分类器一起使用时,基于搜索的策略(例如GA)与指标选择策略的效果很好。在预测认知复杂类别时,GA的F值为0.845,而使用主成分分析的F值为0.740,而仅使用CK指标套件时的F值为0.750 。;存在许多传统的源代码指标来捕获大小,算法的复杂性,模块的内聚性和耦合性。面向对象的系统引入了诸如封装和继承之类的其他结构概念,从而提供了更多方式来捕获和度量耦合,内聚性,复杂性和大小的不同方面。需要回答的一个重要问题是:对于特定的预测目标,模型应使用哪些指标?;通过使用白盒预测模型(决策树分类器)检查GA选择的指标,可以深入了解退化系统的结构特性观察产品质量。已将源代码度量标准设计用于人类理解和程序理解,而用于认知复杂性的预测模型仅使用源代码度量标准即可很好地发挥作用。仅使用源代码度量标准并且需要其他非源代码信息(例如模块修改历史记录或测试历史记录)时,易于发生故障的模块的模型效果不佳。

著录项

  • 作者

    Vivanco, Rodrigo.;

  • 作者单位

    University of Manitoba (Canada).;

  • 授予单位 University of Manitoba (Canada).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 152 p.
  • 总页数 152
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:29

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号