A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction

Rayhanul Islam; Abdus Satter; Atish Kumar Dipongkor; Md. Saeed Siddik; Kazi Sakib

首页> 外文期刊>Journal of software >A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction

【24h】

A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction

机译：一种新的方法，用于将n维数据集转换为两个维度，提高软件缺陷预测精度

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software defect prediction model is trained using code metrics and historical defect information to identify probable software defects. The accuracy and performance of a prediction model largely depend on the training dataset. In order to provide proper training dataset, it is required to make the dataset clustered with less variabilities using clustering algorithms. However, clustering process is hampered due to multiple attributes of dataset such as Coupling between Objects, Response for Class, Lines of Code, etc. This research will aim to predict software defects through reducing code metrics dimensions to two latent variables. It will finally help the clustering algorithms to group data properly for the defect prediction model. In this paper, the dataset similarities are analyzed by reducing code metrics’ attributes into two latent variables based on their impacts to defects. Their impacts to defects can be analyzed using regression analysis because it identifies the relationship among a set of dependent and independent variables. Then, the code metrics are merged into two variables - PosImpactValue and NegImpactValue based on their positive or negative impact, respectively. As a result, multi-dimensional dataset is mapped into two-dimensional dataset. Plotting those dimensions reduced datasets enable distance-based clustering algorithms to group those datasets based on their similarities. Experiments have been performed on 18 releases of 6 open source software datasets such as jEdit, Ant, Xalan, Synapse, Tomcat and Camel. For comparative analysis, one of the most commonly used dimension reduction techniques named Principle Component Analysis (PCA) and two popular clustering techniques in defect prediction – DBSCAN and WHERE have been used in the experiment. First, the dimensions of the experimental datasets have been reduced using the proposed technique and PCA separately. Then, the reduced datasets have been clustered using DBSCAN and WHERE independently for identifying number of defects accurately. The comparative result analysis shows that the defect prediction models based on the clustering algorithms are more accurate for the dataset reduced by the proposed technique than PCA.

机译：使用代码指标和历史缺陷信息训练软件缺陷预测模型，以识别可能的软件缺陷。预测模型的准确性和性能很大程度上取决于训练数据集。为了提供适当的训练数据集，需要使用群集算法使数据集与较少的变量聚集。但是，由于数据集的多个属性，例如对象之间的耦合，类，代码行等的耦合等多个属性，群集过程受到阻碍。该研究旨在通过将代码度量尺寸减少到两个潜变量来预测软件缺陷。它将最终帮助聚类算法为缺陷预测模型正确统一数据。在本文中，通过将代码度量的属性将代码度量的属性降低到两个潜变量，根据其对缺陷的影响来分析数据集相似度。可以使用回归分析分析对对缺陷的影响，因为它识别一组依赖性和独立变量之间的关系。然后，将代码指标分别合并为两个变量 - 基于它们的正面或负面影响，分别合并为两个变量和否定值。结果，多维数据集映射到二维数据集中。绘制那些尺寸的减少数据集使基于距离的聚类算法能够基于其相似性对这些数据集进行分组。实验已经在18个释放的6个开源软件数据集中进行，例如Jedit，Ant，Xalan，Synapse，Tomcat和Camel。对于比较分析，在缺陷预测中命名的主要成分分析（PCA）和两种流行的聚类技术中最常用的尺寸减少技术之一 - DBSCAN以及在实验中使用的地方。首先，通过分别使用所提出的技术和PCA来减少实验数据集的尺寸。然后，使用DBSCAN组分还原的数据集以及独立地用于准确地识别缺陷的数量。比较结果分析表明，基于聚类算法的缺陷预测模型对于通过所提出的技术而比PCA更准确地减少数据集。

著录项

来源
《Journal of software》 |2020年第6期|共16页
作者
Rayhanul Islam; Abdus Satter; Atish Kumar Dipongkor; Md. Saeed Siddik; Kazi Sakib;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Software defect predictionprincipal component analysisDBSCANWHERE clusteringcode metrics’ dimension reduction techniquedataset pre-processing.;

机译：软件缺陷预测Principal分量分析DBSCANWORMERINGCODE度量的尺寸减少技术预处理。;

相似文献

外文文献
中文文献
专利

1. Lessons Learned from the Assessment of Software Defect Prediction on WLCG Software: A Study with Unlabelled Datasets and Machine Learning Techniques [J] . Elisabetta Ronchieri, Marco Canaparo, Mauro Belgiovine, EPJ Web of Conferences . 2020,第4期

机译：从WLCG软件的软件缺陷预测评估中汲取的经验教训：具有未标记数据集和机器学习技术的研究
2. Comparison of Three-Dimensional Datasets by Using the Generalized n-Dimensional ( n-D) Feature Selective Validation (FSV) Technique [J] . Gang Zhang, Antonio Orlandi, Alistair P. Duffy, Electromagnetic Compatibility, IEEE Transactions on . 2017,第1期

机译：通过使用广义n维（n-D）特征选择验证（FSV）技术比较三维数据集
3. AN EVOLUTIONARY APPROACH FOR SOFTWARE DEFECT PREDICTION ON HIGH DIMENSIONAL DATA USING SUBSPACE CLUSTERING AND MACHINE LEARNING [J] . SUMANGALA PATIL, A.NAGARAJA RAO, C. SHOBA BINDU Journal of Theoretical and Applied Information Technology . 2019,第21期

机译：利用子空间聚类和机器学习对高维数据进行软件缺陷预测的进化方法
4. Increasing Accuracy of Software Defect Prediction using 1-dimensional CNN with SVM [C] . Hitendra Singh Yadav IEEE International Conference for Innovation in Technology . 2020

机译：使用具有SVM的1维CNN的软件缺陷预测的准确性提高
5. Software defect prediction on unlabeled datasets [D] . Nam, Jaechang. 2015

机译：未标记数据集上的软件缺陷预测
6. Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches [O] . Philip S. Boonstra, Jeremy M.G. Taylor, Bhramar Mukherjee -1

机译：合并辅助信息以改进高维数据集中的预测：收缩方法的集合
7. P1033Echocardiographic predictive model of new-onset postoperative atrial fibrillation after abdominal surgeryP1034Right ventricular outflow premature contractions induce left ventricular dyssynchronyP1035Simultaneous biventricular impairment after Trastuzumab therapyP1036New multi-layer approach of myocardial deformation by 2D speckle tracking imaging might improve characterization of heart failureP1037Mechanical dyssynchrony and super-response to cardiac resynchronisation therapy in patients with congestive heart failureP1038Prediction of major cardiovascular events in dialysis patients by means of 2D strainP1039Surprisingly frequent findings of subclinical dysfunctional left ventricle in COPD without pulmonary hypertension.P1040Role of 2D speckle tracking echocardiography in the assessment of left atrial function in hypertensive patientsP1041Can speckle tracking echocardiography derived early systolic lengthening duration predict myocardial viability?P1042Certain left ventricular strains are supranormal in elite athletes - a three-dimensional speckle-tracking echocardiographic studyP1043Are biventricular systolic functions impaired in patient with coronoray slow flow? A prospective study with three dimensional speckle trackingP1044Validation of the accuracy and feasibility of new technologies for the assessment of cardiac function in cancer patientsP1045Cardiac computed tomography besides coronary arteries - four years of experience of a high volume cardiovascular centre [O] . L. Demirevska, OA. Enescu, K. Keramida, 2016

机译：P1033电影通知预测模型术后腹科术后心房颤动1034级室外流过早收集诱发左心室血液紊乱诱发左心室血液紊乱后曲妥珠单抗治疗103620362036203620362036203620362036新型的心肌变形方法2D散斑追踪成像可能改善心脏衰竭的表征1037机械仓库和超级反应的特征心脏重新同步治疗患者充血性心脏衰竭，透析患者主要心血管事件的患者通过2D菌株P11039常态频繁发现COPD中的脑脊液功能障碍左心室，没有肺动脉高压。P1040级散液在高血压患者左心房函数评估中的2D散斑追踪超声心动图31041CAN散斑跟踪超声心动图衍生出早期收缩延长持续时间预测心肌活力？P1042或左心室ST降雨在精英运动员中是supranormal - 一种三维散斑跟踪超声心动图STOPYP1043ARE在患者中患者患者患者慢流动？具有三维散斑轨迹的前瞻性研究P11044验证新技术的准确性和可行性，以评估癌症患者心脏功能的评估，除了冠状动脉之外，冠状动脉 - 四年的高卷心血管中心经验

A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction

摘要

著录项

相似文献

相关主题

期刊订阅