首页> 外文期刊>IEEE Transactions on Software Engineering >Analyzing data sets with missing data: an empirical evaluation ofimputation methods and likelihood-based methods

Analyzing data sets with missing data: an empirical evaluation ofimputation methods and likelihood-based methods


获取原文并翻译 | 示例


Missing data are often encountered in data sets used to constructnsoftware effort prediction models. Thus far, the common practice hasnbeen to ignore observations with missing data. This may result in biasednprediction models. The authors evaluate four missing data techniquesn(MDTs) in the context of software cost modeling: listwise deletion (LD),nmean imputation (MI), similar response pattern imputation (SRPI), andnfull information maximum likelihood (FIML). We apply the MDTs to an ERPndata set, and thereafter construct regression-based prediction modelsnusing the resulting data sets. The evaluation suggests that only FIML isnappropriate when the data are not missing completely at random (MCAR).nUnlike FIML, prediction models constructed on LD, MI and SRPI data setsnwill be biased unless the data are MCAR. Furthermore, compared to LD, MInand SRPI seem appropriate only if the resulting LD data set is too smallnto enable the construction of a meaningful regression-based predictionnmodel
机译:在用于构建软件工作量预测模型的数据集中经常会遇到丢失的数据。到目前为止,通用实践尚未忽略具有缺失数据的观察结果。这可能会导致预测模型有偏差。作者在软件成本建模的背景下评估了四种缺失的数据技术n(MDT):逐列表删除(LD),纳米估算(MI),相似响应模式估算(SRPI)和最大信息最大似然(FIML)。我们将MDT应用于ERPndata集,然后使用所得数据集构建基于回归的预测模型。评估表明,当数据并非完全随机丢失(MCAR)时,仅FIML是不合适的。n与FIML不同,除非数据为MCAR,否则将基于LD,MI和SRPI数据集构建的预测模型会产生偏差。此外,与LD相比,MInand SRPI仅在结果LD数据集太小而无法构建有意义的基于回归的预测模型时才显得合适。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号