An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

Peng He; Yao He; Lvjun Yu; Bing Li

首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

【24h】

An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

机译：通过简化训练数据的交叉项目缺陷预测的改进方法

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Cross-project defect prediction (CPDP) on projects with limited historical data has attracted much attention. To the best of our knowledge, however, the performance of existing approaches is usually poor, because of low quality cross-project training data. The objective of this study is to propose an improved method for CPDP by simplifying training data, labeled as TDSelector, which considers both the similarity and the number of defects that each training instance has (denoted by defects), and to demonstrate the effectiveness of the proposed method. Our work consists of three main steps. First, we constructed TDSelector in terms of a linear weighted function of instances’ similarity and defects. Second, the basic defect predictor used in our experiments was built by using the Logistic Regression classification algorithm. Third, we analyzed the impacts of different combinations of similarity and the normalization of defects on prediction performance and then compared with two existing methods. We evaluated our method on 14 projects collected from two public repositories. The results suggest that the proposed TDSelector method performs, on average, better than both baseline methods, and the AUC values are increased by up to 10.6% and 4.3%, respectively. That is, the inclusion of defects is indeed helpful to select high quality training instances for CPDP. On the other hand, the combination of Euclidean distance and linear normalization is the preferred way for TDSelector. An additional experiment also shows that selecting those instances with more bugs directly as training data can further improve the performance of the bug predictor trained by our method.

机译：有限历史数据有限的项目的跨项目缺陷预测（CPDP）引起了很多关注。然而，据我们所知，现有方法的表现通常差，因为质量低的交叉项目培训数据。本研究的目的是通过简化标记为TDSelector的培训数据提出了一种改进的CPDP方法，这考虑了每个训练实例（由缺陷表示）的相似性和缺陷的数量，并证明了效果提出的方法。我们的工作包括三个主要步骤。首先，我们在实例的线性加权函数方面构建了TDSelector;相似性和缺陷。其次，通过使用Logistic回归分类算法构建了我们实验中使用的基本缺陷预测器。第三，我们分析了不同相似性组合的影响和对预测性能的缺陷的标准化，然后与现有方法进行比较。我们对来自两名公共存储库收集的14个项目进行了评估。结果表明，所提出的TDSelector方法平均而言比基线方法更好，并且AUC值增加到10.6＆＃x25;和4.3＆＃x25;。也就是说，包含缺陷的含义确实有助于为CPDP选择高质量的培训实例。另一方面，欧几里德距离和线性归一化的组合是TDSelector的首选方法。另外的实验还显示，在培训数据中直接选择具有更多错误的实例可以进一步提高由我们的方法训练的Bug预测器的性能。

著录项

来源
《Mathematical Problems in Engineering: Theory, Methods and Applications》 |2018年第a期|共页
作者
Peng He; Yao He; Lvjun Yu; Bing Li;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类工程数学;
关键词

相似文献

外文文献
中文文献
专利

1. An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data [J] . He Peng, He Yao, Yu Lvjun, Mathematical Problems in Engineering . 2018,第PTa6期

机译：一种简化训练数据的跨项目缺陷预测的改进方法
2. Cross-version defect prediction: use historical data, cross-project data, or both? [J] . Sousuke Amasaki Empirical Software Engineering . 2020,第2期

机译：跨版本缺陷预测：使用历史数据，跨项目数据，还是同时使用两者？
3. An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction [J] . Qiu Shaojian, Lu Lu, Jiang Siyu, International Journal of Pattern Recognition and Artificial Intelligence . 2019,第12期

机译：跨项目缺陷预测中不平衡集成学习方法的研究
4. Improving Cross-Project Defect Prediction Methods with Data Simplification [C] . Amasaki Sousuke, Kawata Kazuya, Yokogawa Tomoyuki Euromicro Conference on Software Engineering and Advanced Applications . 2015

机译：用数据简化改进交叉项目缺陷预测方法
5. Heuristic and self -training methods for improving gene prediction in prokaryotes [D] . Besemer, John David 2003

机译：用于改进原核生物基因预测的启发式和自训练方法
6. Comparisons of improved genomic predictions generated by different imputation methods for genotyping by sequencing data in livestock populations [O] . Xiao Wang, Guosheng Su, Dan Hao, 2020

机译：比较不同插补方法为牲畜群体测序数据进行基因分型的基因组预测的改进
7. An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data [O] . Peng He, Yao He, Lvjun Yu, 2018

机译：通过简化训练数据的交叉项目缺陷预测的改进方法

An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅