...
首页> 外文期刊>Journal of software: evolution and process >An empirical comparison of validation methods for software prediction models
【24h】

An empirical comparison of validation methods for software prediction models

机译:软件预测模型验证方法的实证比较

获取原文
获取原文并翻译 | 示例

摘要

Model validation methods (e.g., k-fold cross-validation) use historical data to predicthow well an estimation technique (e.g., random forest) performs on the current(or future) data. Studies in the contexts of software development effort estimation(SDEE) and software fault prediction (SFP) have used and investigated differentmodel validation methods. However, no conclusive indications to suggest whichmodel validation method has a major impact on the prediction accuracy and stabilityof estimation techniques. Some studies have investigated model validation methodsspecific to data about either SDEE or SFP. To the best of our knowledge, there is nostudy in the literature, which has employed different validation methods both withSDEE and SFP data. The aim of this paper is to consider different methods (10) fromthe family of cross-validation (CV) and bootstrap validation methods to identifywhich one contributes to obtaining a better prediction accuracy for both types ofdata. We also evaluate which model validation methods allow the estimationtechniques to provide stable performances (i.e., with lower variance). To this aim, wepresent an empirical study involving six datasets from the domain of SDEE and sixdatasets from the SFP domain. The results reveal that repeated 10-fold CV withSDEE and optimistic boot with SFP data are the model validation methods thatprovide a better prediction accuracy in a greater number of experiments than theother model validation methods. Furthermore, a model validation method canimprove the prediction accuracy up to 60% with SDEE data and up to 36% whenemploying SFP data. The analysis also reveals that repeated fivefold CV producesmore stable performances when the experiments are repeated on the same data.
机译:模型验证方法(例如,k折验证)使用历史数据来预测估计技术(例如,随机林)对当前执行的程度如何(或将来)数据。软件开发工作估算的背景下的研究(SDEE)和软件故障预测(SFP)已经使用和调查不同模型验证方法。但是,没有确凿的指示表明这一点模型验证方法对预测准确性和稳定性产生了重大影响估计技术。有些研究已经调查了模型验证方法特定于有关SDEE或SFP的数据。据我们所知,没有文献中的研究,它采用了不同的验证方法SDEE和SFP数据。本文的目的是考虑不同的方法(10)交叉验证(CV)系列和引导验证方法以识别哪一个有助于获得两种类型的更好的预测准确性数据。我们还评估哪些模型验证方法允许估算提供稳定的性能的技术(即,具有较低的方差)。为此,我们提出了一个涉及SDEE领域的六个数据集的实证研究来自SFP域的数据集。结果表明,重复10倍的CV使用SFP数据的SDEE和乐观启动是模型验证方法在大量的实验中提供更好的预测精度其他模型验证方法。此外,模型验证方法可以通过SDEE数据提高预测精度高达60%,何时最高可达36%使用SFP数据。分析还揭示了重复的五倍CV产生在同一数据上重复实验时更稳定的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号