首页> 外文期刊>IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans >Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering
【24h】

Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering

机译:具有半监督聚类的未标记程序模块的软件质量分析

获取原文
获取原文并翻译 | 示例

摘要

Software quality assurance is a vital component of software project development. A software quality estimation model is trained using software measurement and defect (software quality) data of a previously developed release or similar project. Such an approach assumes that the development organization has experience with systems similar to the current project and that defect data are available for all modules in the training data. In software engineering practice, however, various practical issues limit the availability of defect data for modules in the training data. In addition, the organization may not have experience developing a similar system. In such cases, the task of software quality estimation or labeling modules as fault prone or not fault prone falls on the expert. We propose a semisupervised clustering scheme for software quality analysis of program modules with no defect data or quality-based class labels. It is a constraint-based semisupervised clustering scheme that uses k-means as the underlying clustering algorithm. Software measurement data sets obtained from multiple National Aeronautics and Space Administration software projects are used in our empirical investigation. The proposed technique is shown to aid the expert in making better estimations as compared to predictions made when the expert labels the clusters formed by an unsupervised learning algorithm. In addition, the software quality knowledge learnt during the semisupervised process provided good generalization performance for multiple test data sets. An analysis of program modules that remain unlabeled subsequent to our semisupervised clustering scheme provided useful insight into the characteristics of their software attributes
机译:软件质量保证是软件项目开发的重要组成部分。使用先前开发的发行版或类似项目的软件度量和缺陷(软件质量)数据来训练软件质量评估模型。这种方法假定开发组织具有与当前项目相似的系统经验,并且缺陷数据可用于培训数据中的所有模块。然而,在软件工程实践中,各种实际问题限制了训练数据中模块的缺陷数据的可用性。此外,组织可能没有开发类似系统的经验。在这种情况下,软件质量评估或将模块标记为容易出错或不容易出错的任务落在专家身上。我们提出了一种半监督聚类方案,用于程序模块的软件质量分析,没有缺陷数据或基于质量的类标签。它是一种基于约束的半监督聚类方案,使用k-means作为基础聚类算法。从多个国家航空航天局软件项目获得的软件测量数据集用于我们的经验研究。与专家标记由无监督学习算法形成的群集时所做的预测相比,所建议的技术可帮助专家做出更好的估计。另外,在半监督过程中学习的软件质量知识为多个测试数据集提供了良好的泛化性能。对我们的半监督聚类方案之后仍未标记的程序模块的分析提供了对其软件属性特征的有用见解

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号