A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

Chao Ni; Wang-Shu Liu; Xiang Chen; Qing Gu; Dao-Xu Chen; Qi-Guo Huang

首页> 外文期刊>计算机科学技术学报（英文版） >A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

【24h】

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

机译：跨项目软件缺陷预测的基于聚类的特征选择方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相关主题

摘要

Cross-project defect prediction (CPDP) uses the labeled data from external source software projects to com-pensate the shortage of useful data in the target project, in order to build a meaningful classification model. However, the distribution gap between software features extracted from the source and the target projects may be too large to make the mixed data useful for training. In this paper, we propose a cluster-based novel method FeSCH (Feature Selection Using Clusters of Hybrid-Data) to alleviate the distribution differences by feature selection. FeSCH includes two phases. The feature clustering phase clusters features using a density-based clustering method, and the feature selection phase selects features from each cluster using a ranking strategy. For CPDP, we design three different heuristic ranking strategies in the second phase. To investigate the prediction performance of FeSCH, we design experiments based on real-world software projects, and study the effects of design options in FeSCH (such as ranking strategy, feature selection ratio, and classifiers). The experimental results prove the effectiveness of FeSCH. Firstly, compared with the state-of-the-art baseline methods, FeSCH achieves better performance and its performance is less affected by the classifiers used. Secondly, FeSCH enhances the performance by effectively selecting features across feature categories, and provides guidelines for selecting useful features for defect prediction.

机译：跨项目缺陷预测（CPDP）使用来自外部源软件项目的标记数据来弥补目标项目中有用数据的不足，以建立有意义的分类模型。但是，从源项目和目标项目中提取的软件功能之间的分布差距可能太大，无法使混合数据对训练有用。在本文中，我们提出了一种基于聚类的新方法FeSCH（使用混合数据聚类进行特征选择），以通过特征选择缓解分布差异。 FeSCH包括两个阶段。特征聚类阶段使用基于密度的聚类方法对特征进行聚类，特征选择阶段使用排名策略从每个聚类中选择特征。对于CPDP，我们在第二阶段设计了三种不同的启发式排名策略。为了调查FeSCH的预测性能，我们基于现实世界的软件项目设计了实验，并研究了FeSCH中设计选项的影响（例如排名策略，特征选择率和分类器）。实验结果证明了FeSCH的有效性。首先，与最新的基线方法相比，FeSCH获得了更好的性能，并且其性能受所用分类器的影响较小。其次，FeSCH通过有效地跨特征类别选择特征来增强性能，并为选择有用的特征进行缺陷预测提供了指导。

著录项

来源
《计算机科学技术学报（英文版）》 |2017年第6期|1090-1107|共18页
作者
Chao Ni; Wang-Shu Liu; Xiang Chen; Qing Gu; Dao-Xu Chen; Qi-Guo Huang;
展开▼
作者单位

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;

School of Computer Science and Technology, Nantong University, Nantong 226019, China;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;

展开▼
收录信息中国科学引文数据库(CSCD);中国科技论文与引文数据库(CSTPCD);
原文格式 PDF
正文语种 eng
中图分类
关键词

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

摘要

著录项

相关主题

期刊订阅