Impact of Data Sampling on Stability of Feature Selection for Software Measurement Data

机译：数据采样对软件测量数据特征选择稳定性的影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software defect prediction can be considered a binary classification problem. Generally, practitioners utilize historical software data, including metric and fault data collected during the software development process, to build a classification model and then employ this model to predict new program modules as either fault-prone (fp) or not-fault-prone (nfp). Limited project resources can then be allocated according to the prediction results by (for example) assigning more reviews and testing to the modules predicted to be potentially defective. Two challenges often come with the modeling process: (1) high-dimensionality of software measurement data and (2) skewed or imbalanced distributions between the two types of modules (fp and nfp) in those datasets. To overcome these problems, extensive studies have been dedicated towards improving the quality of training data. The commonly used techniques are feature selection and data sampling. Usually, researchers focus on evaluating classification performance after the training data is modified. The present study assesses a feature selection technique from a different perspective. We are more interested in studying the stability of a feature selection method, especially in understanding the impact of data sampling techniques on the stability of feature selection when using the sampled data. Some interesting findings are found based on two case studies performed on datasets from two real-world software projects.

机译：软件缺陷预测可以被认为是二进制分类问题。通常，从业人员利用历史软件数据（包括在软件开发过程中收集的度量标准和故障数据）来建立分类模型，然后使用该模型来预测新程序模块为易错（fp）或不易错（ nfp）。然后，可以根据预测结果，通过（例如）为预测有潜在缺陷的模块分配更多评论和测试，来分配有限的项目资源。建模过程通常面临两个挑战：（1）软件测量数据的高维性;（2）这些数据集中两种类型的模块（fp和nfp）之间的分布偏斜或不平衡。为了克服这些问题，已经进行了广泛的研究以提高训练数据的质量。常用的技术是特征选择和数据采样。通常，研究人员会在修改训练数据后集中精力评估分类性能。本研究从不同的角度评估了一种特征选择技术。我们对研究特征选择方法的稳定性更感兴趣，尤其是在了解使用采样数据时数据采样技术对特征选择稳定性的影响。基于对两个实际软件项目的数据集进行的两个案例研究，发现了一些有趣的发现。

著录项

来源
《2011 23rd IEEE International Conference on Tools with Artificial Intelligence》|2011年|p.1004-1011|共8页
会议地点
作者
Gao Kehan; Khoshgoftaar Taghi M.; Napolitano Amri;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
data sampling; defect prediction; feature selection; software metrics; stability;

机译：数据采样;缺陷预测;特征选择;软件指标;稳定性;

相似文献

外文文献
中文文献
专利

1. Assessments of Feature Selection Techniques with Respect to Data Sampling for Highly Imbalanced Software Measurement Data [J] . Kehan Gao, Taghi M. Khoshgoftaar International Journal of Reliability, Quality and Safety Engineering . 2015,第2期

机译：关于高度不平衡的软件测量数据的数据采样的特征选择技术评估
2. Aggregating Data Sampling with Feature Subset Selection to Address Skewed Software Defect Data [J] . Kehan Gao, Taghi M. Khoshgoftaar, Amri Napolitano International journal of software engineering and knowledge engineering . 2015,第9a10期

机译：聚合具有特征子集选择的数据采样以解决歪斜的软件缺陷数据
3. Comparing Feature Selection Techniques for Software Quality Estimation Using Data-Sampling-Based Boosting Algorithms [J] . Taghi M. Khoshgoftaar, Kehan Gao, Ye Chen, International Journal of Reliability, Quality and Safety Engineering . 2015,第3期

机译：使用基于数据采样的Boosting算法比较软件质量评估的特征选择技术
4. Impact of Data Sampling on Stability of Feature Selection for Software Measurement Data [C] . Gao Kehan, Khoshgoftaar Taghi M., Napolitano Amri International Conference on Tools with Artificial Intelligence . 2011

机译：数据采样对软件测量数据特征选择稳定性的影响
5. On Feature Selection Stability: A Data Perspective. [D] . Alelyani, Salem. 2013

机译：关于特征选择稳定性：数据透视。
6. Measuring Stability of Feature Selection in Biomedical Datasets [O] . Jonathan L. Lustgarten, Vanathi Gopalakrishnan, Shyam Visweswaran 2009

机译：测量生物医学数据集中特征选择的稳定性
7. Software Defect Prediction Based on Data Sampling and Multivariate Filter Feature Selection [O] . Yating Lin, Yiwen Zhong 2018

机译：基于数据采样和多变量滤波器特征选择的软件缺陷预测
8. Science of Test Measurement Accuracy - Data Sampling and Filter Selection during Data Acquisition. [R] . Kidman, D. S. 2015

机译：测试测量精度科学 - 数据采集过程中的数据采样和滤波器选择。

Impact of Data Sampling on Stability of Feature Selection for Software Measurement Data

摘要

著录项

相似文献

相关主题

期刊订阅