Comparative Study of Various Methods of Handling Missing Data

Fredrick OchiengOdhiambo

首页> 外文期刊>Mathematical Modelling and Applications >Comparative Study of Various Methods of Handling Missing Data

【24h】

Comparative Study of Various Methods of Handling Missing Data

机译：处理缺失数据的各种方法的比较研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Scientific literature lack straight forward answer as to the most suitable method for missing data imputation in terms of simplicity, accuracy and ease of use among the existing methods. Exploration various methods of data imputation is done, and then a robust method of data imputation is proposed. The paper uses simulated data sets generated for various distributions. A regression function on the simulated data sets is used and obtained the residual standard errors for the function obtained. Data are randomly from the set of independent variables to create artificial data-non response and use suitable methods to impute the missing data. The method of Mean, regression, hot and cold decking, multiple, median imputation, list wise deletion, EM algorithm and the nearest neighbour method are considered. This paper investigates the three most common traditional methods of handling missing data to establish the most optimal method. The suitability is hence determined by the method whose imputed data sample characteristic does not vary considerably from the original data set before imputation. The variation is here determined using the regression intercept and the residual standard error. R statistical package has been used widely in most of the regression cases. Microsoft excel is used to determine the correlation of columns in hot decking method; this is because it is readily available as a component of Microsoft package. The results from data analysis section indicated an intercept and R-squared values that closely mirror those of original data sets, suggesting that median imputation is a better data imputation method among the conventional methods. This finding is important from the research point of view, given the many cases of data missingness in scientific research. Finding and using the median is simple and as such most researchers have a ready tool at hand for handling missing data.

机译：科学文献缺乏直接答案，以最合适的方法在现有方法中的简单性，准确性和易用性方面缺少数据归档。探索已经完成了各种数据载荷方法，然后提出了一种鲁棒的数据载体方法。本文使用为各种分布生成的模拟数据集。使用模拟数据集的回归函数，并获得所获得的函数的剩余标准误差。数据从集合的独立变量随机创建人工数据 - 非响应，并使用合适的方法来赋予丢失的数据。考虑了均值，回归，冷热和冷光，多重，中值归档，列表明智删除，EM算法和最近邻近方法的方法。本文调查了处理缺失数据的三种最常见的传统方法，以建立最佳方法。因此，适用性由归属数据样本特性在归属之前的原始数据集中不差异的方法确定的方法确定。这里使用回归截距和残余标准误差确定变型。 R统计包已在大多数回归案例中广泛使用。 Microsoft Excel用于确定Hot Decking方法中列的相关性;这是因为它易于作为Microsoft包的组件。数据分析部分的结果表明了密切镜像原始数据集的截距和R线值，表明中位数估算是传统方法中的更好的数据载体方法。考虑到科学研究中的数据缺失的许多情况，这一发现在研究方面非常重要。寻找和使用中位数很简单，因此大多数研究人员手头准备好用于处理缺失的数据。

著录项

来源
《Mathematical Modelling and Applications》 |2020年第2期|共7页
作者
Fredrick OchiengOdhiambo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
RegressionNearest NeighborHot DeckingMedian SubstitutionMissing Data;

机译：回归邻近DeckingMedian替代数据;

相似文献

外文文献
中文文献
专利

1. Comparative methods for handling missing data in large databases [J] . HenryA.J., HeveloneN.D., LipsitzS., Journal of vascular surgery . 2013,第5期

机译：处理大型数据库中缺失数据的比较方法
2. Evaluating the Performance of Multiple Imputation Methods for Handling Missing Values in Time Series Data: A Study Focused on East Africa, Soil-Carbonate-Stable Isotope Data [J] . Hossein Hassani, Mahdi Kalantari, Zara Ghodsi Stats . 2019,第4期

机译：评估多种估算方法的处理，以便在时间序列数据中处理缺失值：专注于东非，土壤 - 碳酸盐稳定同位素数据的研究
3. Comparison of Methods of Handling Missing Data: A Case Study of KDHS 2010 Data [J] . Shelmith Nyagathiri Kariuki, Anthony Waititu Gichuhi, Anthony Kibira Wanjoya American Journal of Theoretical and Applied Statistics . 2015,第3期

机译：处理缺失数据的方法比较：以KDHS 2010数据为例
4. A Comparative Study on Missing Data Handling Using Machine Learning for Human Activity Recognition [C] . Tahera Hossain, Sozo Inoue International Conference on Informatics, Electronics Vision;International Conference on Imaging, Vision Pattern Recognition . 2019

机译：基于机器学习的人类活动识别缺失数据处理的比较研究
5. Approximate Bayesian Approaches and Semiparametric Methods for Handling Missing Data [D] . Sang, Hejian. 2018

机译：用于处理缺失数据的近似贝叶斯方法和半甲酰均方法
6. Semi-Parametric Methods of Handling Missing Data in Mortal Cohortsunder Non-Ignorable Missingness [O] . Lan Wen, Shaun R. Seaman -1

机译：死亡人群数据丢失的半参数方法不可忽视的失踪
7. Missing-Data Handling Methods for Lifelogs-Based Wellness Index Estimation: Comparative Analysis With Panel Data [O] . Ki-Hun Kim, Kwang-Jae Kim 2020

机译：基于Lifelogs的健康索引估计的缺失数据处理方法：面板数据的比较分析

Comparative Study of Various Methods of Handling Missing Data

摘要

著录项

相似文献

相关主题

期刊订阅