...
首页> 外文期刊>International Journal of Electrical and Computer Engineering >A New Paradigm for Development of Data Imputation Approach for Missing Value Estimation
【24h】

A New Paradigm for Development of Data Imputation Approach for Missing Value Estimation

机译:缺失值估计的数据插补方法开发的新范例

获取原文

摘要

Many real-world applications encountered a common issue in data analysis is the presence of missing data value and challenging task in many applications such as wireless sensor networks, medical applications and psychological domain and others. Learning and prediction in the presence of missing value can be treacherous in machine learning, data mining and statistical analysis. A missing value can signify important information about dataset in the mining process. Handling missing data value is a challenging task for the data mining process. In this paper, we propose new paradigm for the development of data imputation method for missing data value estimation based on centroids and the nearest neighbours. Firstly, identify clusters based on the k-means algorithm and calculate centroids and the nearest neighbour data records. Secondly, the nearest distances from complete dataset as well as incomplete dataset from the centroids and estimated the nearest data record which tends to be curse dimensionality. Finally, impute the missing value based nearest neighbour record using statistical measure called z-score. The experimental study demonstrates strengthen of the proposed paradigm for the imputation of the missing data value estimation in dataset. Tests have been run using different types of datasets in order to validate our approach and compare the results with other imputation methods such as KNNI, SVMI, WKNNI, KMI and FKNNI. The proposed approach is geared towards maximizing the utility of imputation with respect to missing data value estimation.
机译:许多现实世界的应用程序在数据分析中遇到一个共同的问题,即无线传感器网络,医疗应用程序和心理领域等许多应用程序中缺少数据值和具有挑战性的任务。在缺少价值的情况下进行学习和预测在机器学习,数据挖掘和统计分析中可能是危险的。缺失值可能表示有关挖掘过程中数据集的重要信息。对于数据挖掘过程来说,处理缺失的数据值是一项艰巨的任务。在本文中,我们提出了一种新的范式,用于开发基于质心和最近邻的缺失数据值估计的数据插补方法。首先,基于k均值算法识别聚类,并计算质心和最近的邻居数据记录。其次,距完整数据集的最近距离以及距质心的不完整数据集,并估计最近的数据记录,该记录往往是诅咒维数。最后,使用称为z得分的统计量度基于缺失值的最近邻居记录。实验研究表明,对于数据集中的缺失数据值估计的推论,该提议范式得到了加强。为了验证我们的方法并将结果与​​其他插补方法(例如KNNI,SVMI,WKNNI,KMI和FKNNI)进行比较,已使用不同类型的数据集进行了测试。所提出的方法旨在针对缺失数据值估计最大化估算的效用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号