首页> 外文期刊>Journal of software >A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction
【24h】

A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction

机译:一种新的方法,用于将n维数据集转换为两个维度,提高软件缺陷预测精度

获取原文
           

摘要

Software defect prediction model is trained using code metrics and historical defect information to identify probable software defects. The accuracy and performance of a prediction model largely depend on the training dataset. In order to provide proper training dataset, it is required to make the dataset clustered with less variabilities using clustering algorithms. However, clustering process is hampered due to multiple attributes of dataset such as Coupling between Objects, Response for Class, Lines of Code, etc. This research will aim to predict software defects through reducing code metrics dimensions to two latent variables. It will finally help the clustering algorithms to group data properly for the defect prediction model. In this paper, the dataset similarities are analyzed by reducing code metrics’ attributes into two latent variables based on their impacts to defects. Their impacts to defects can be analyzed using regression analysis because it identifies the relationship among a set of dependent and independent variables. Then, the code metrics are merged into two variables - PosImpactValue and NegImpactValue based on their positive or negative impact, respectively. As a result, multi-dimensional dataset is mapped into two-dimensional dataset. Plotting those dimensions reduced datasets enable distance-based clustering algorithms to group those datasets based on their similarities. Experiments have been performed on 18 releases of 6 open source software datasets such as jEdit, Ant, Xalan, Synapse, Tomcat and Camel. For comparative analysis, one of the most commonly used dimension reduction techniques named Principle Component Analysis (PCA) and two popular clustering techniques in defect prediction – DBSCAN and WHERE have been used in the experiment. First, the dimensions of the experimental datasets have been reduced using the proposed technique and PCA separately. Then, the reduced datasets have been clustered using DBSCAN and WHERE independently for identifying number of defects accurately. The comparative result analysis shows that the defect prediction models based on the clustering algorithms are more accurate for the dataset reduced by the proposed technique than PCA.
机译:使用代码指标和历史缺陷信息训练软件缺陷预测模型,以识别可能的软件缺陷。预测模型的准确性和性能很大程度上取决于训练数据集。为了提供适当的训练数据集,需要使用群集算法使数据集与较少的变量聚集。但是,由于数据集的多个属性,例如对象之间的耦合,类,代码行等的耦合等多个属性,群集过程受到阻碍。该研究旨在通过将代码度量尺寸减少到两个潜变量来预测软件缺陷。它将最终帮助聚类算法为缺陷预测模型正确统一数据。在本文中,通过将代码度量的属性将代码度量的属性降低到两个潜变量,根据其对缺陷的影响来分析数据集相似度。可以使用回归分析分析对对缺陷的影响,因为它识别一组依赖性和独立变量之间的关系。然后,将代码指标分别合并为两个变量 - 基于它们的正面或负面影响,分别合并为两个变量和否定值。结果,多维数据集映射到二维数据集中。绘制那些尺寸的减少数据集使基于距离的聚类算法能够基于其相似性对这些数据集进行分组。实验已经在18个释放的6个开源软件数据集中进行,例如Jedit,Ant,Xalan,Synapse,Tomcat和Camel。对于比较分析,在缺陷预测中命名的主要成分分析(PCA)和两种流行的聚类技术中最常用的尺寸减少技术之一 - DBSCAN以及在实验中使用的地方。首先,通过分别使用所提出的技术和PCA来减少实验数据集的尺寸。然后,使用DBSCAN组分还原的数据集以及独立地用于准确地识别缺陷的数量。比较结果分析表明,基于聚类算法的缺陷预测模型对于通过所提出的技术而比PCA更准确地减少数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号