Analyzing data sets with missing data: an empirical evaluation ofimputation methods and likelihood-based methods

Myrtveit I.; Stensrud E.; Olsson U.H.

首页> 外文期刊>IEEE Transactions on Software Engineering >Analyzing data sets with missing data: an empirical evaluation ofimputation methods and likelihood-based methods

【24h】

Analyzing data sets with missing data: an empirical evaluation ofimputation methods and likelihood-based methods

机译：分析缺少数据的数据集：对输入方法和基于似然方法的经验评估

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Missing data are often encountered in data sets used to constructnsoftware effort prediction models. Thus far, the common practice hasnbeen to ignore observations with missing data. This may result in biasednprediction models. The authors evaluate four missing data techniquesn(MDTs) in the context of software cost modeling: listwise deletion (LD),nmean imputation (MI), similar response pattern imputation (SRPI), andnfull information maximum likelihood (FIML). We apply the MDTs to an ERPndata set, and thereafter construct regression-based prediction modelsnusing the resulting data sets. The evaluation suggests that only FIML isnappropriate when the data are not missing completely at random (MCAR).nUnlike FIML, prediction models constructed on LD, MI and SRPI data setsnwill be biased unless the data are MCAR. Furthermore, compared to LD, MInand SRPI seem appropriate only if the resulting LD data set is too smallnto enable the construction of a meaningful regression-based predictionnmodel

机译：在用于构建软件工作量预测模型的数据集中经常会遇到丢失的数据。到目前为止，通用实践尚未忽略具有缺失数据的观察结果。这可能会导致预测模型有偏差。作者在软件成本建模的背景下评估了四种缺失的数据技术n（MDT）：逐列表删除（LD），纳米估算（MI），相似响应模式估算（SRPI）和最大信息最大似然（FIML）。我们将MDT应用于ERPndata集，然后使用所得数据集构建基于回归的预测模型。评估表明，当数据并非完全随机丢失（MCAR）时，仅FIML是不合适的。n与FIML不同，除非数据为MCAR，否则将基于LD，MI和SRPI数据集构建的预测模型会产生偏差。此外，与LD相比，MInand SRPI仅在结果LD数据集太小而无法构建有意义的基于回归的预测模型时才显得合适。

著录项

来源
《IEEE Transactions on Software Engineering》 |2001年第11期|p.999-1013|共15页
作者
Myrtveit I.; Stensrud E.; Olsson U.H.;
展开▼
作者单位

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算机软件;
关键词
data analysis; maximum likelihood estimation; software cost estimation; statistical analysis; ERP data set; FIML; LD; MCAR; MDTs; MI; SRPI data sets; biased prediction models; data set analysis; information maximum likelihood; listwise deletion; mean imputation; missing;

机译：数据分析;最大似然估计;软件成本估计;统计分析;ERP数据集;FIML;LD;MCAR;MDTs;MI;SRPI数据集;偏向预测模型;数据集分析;信息最大似然;按列删除;均值推算;失踪;
入库时间 2022-08-17 13:41:49

相似文献

外文文献
中文文献
专利

1. Comments on: Missing data methods in longitudinal studies: a review - Considerations for sensitivity analysis with likelihood-based models [J] . Joseph W. Hogan Test: An Official Journal of the Spanish Society of Statistics and Operations Research . 2009,第1期

机译：评论：纵向研究中的数据方法缺失：综述-基于似然模型的敏感性分析的注意事项
2. Comments on: Missing data methods in longitudinal studies: a review - Considerations for sensitivity analysis with likelihood-based models [J] . Joseph W. Hogan Test: An Official Journal of the Spanish Society of Statistics and Operations Research . 2009,第1期

机译：评论：纵向研究中的数据方法缺失：综述-基于似然模型的敏感性分析的注意事项
3. A practical introduction to methods for analyzing longitudinal data in the presence of missing data using a marijuana price survey [J] . Jeremy N.V. Miles, Priscillia Hunt Journal of criminal psychology . 2015,第2期

机译：实际介绍使用大麻价格调查在缺少数据的情况下分析纵向数据的方法
4. Likelihood-based Multiple Imputation by Event Chain Methodology for Repair of Imperfect Event Logs with Missing Data [C] . Sunghyun Sim, Hyerim Bae, Yulim Choi 2019 International Conference on Process Mining . 2019

机译：基于事件链方法的基于可能性的多重插补，用于修复缺失数据的不完善事件日志
5. Evaluating Multiple Imputation Methods for Longitudinal Healthy Aging Index—A Score Variable with Data Missing Due to Death, Dropout and Several Missing Data Mechanisms [D] . Kane, Elizabeth L. 2017

机译：纵向健康老龄化指数的多种估算方法的评估-一个因死亡，辍学和几种缺失数据机制导致数据缺失的得分变量
6. Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets [O] . Xiaofan Zhou, Xing-Xing Shen, Chris Todd Hittinger, -1

机译：使用经验性系统生物学数据集评估基于快速最大似然性的系统发育程序
7. Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods [O] . Ingunn Myrtveit, Erik Stensrud, Ulf H. Olsson 2001

机译：分析缺少数据的数据集：插补方法和基于似然方法的实证评估

Analyzing data sets with missing data: an empirical evaluation ofimputation methods and likelihood-based methods

摘要

著录项

相似文献

相关主题

期刊订阅