首页> 外文期刊>Neurocomputing >A neural network-based framework for the reconstruction of incomplete data sets
【24h】

A neural network-based framework for the reconstruction of incomplete data sets

机译:基于神经网络的不完整数据集重构框架

获取原文
获取原文并翻译 | 示例

摘要

The treatment of incomplete data is an important step in the pre-processing of data. We propose a novel nonparametric algorithm Generalized regression neural network Ensemble for Multiple Imputation (GEMI). We also developed a single imputation (SI) version of this approach-GESI. We compare our algorithms with 25 popular missing data imputation algorithms on 98 real-world and synthetic datasets for various percentage of missing values. The effectiveness of the algorithms is evaluated in terms of (i) the accuracy of output classification: three classifiers (a generalized regression neural network, a multilayer perceptron and a logistic regression technique) are separately trained and tested on the dataset imputed with each imputation algorithm, (ii) interval analysis with missing observations and (iii) point estimation accuracy of the missing value imputation. GEMI outperformed GESI and all the conventional imputation algorithms in terms of all three criteria considered. Abbreviations: EM, expectation maximization; GA, genetic algorithm: GRNN, generalized regression neural networks; HD, hot-deck imputation; HUX, half uniform crossover; KNN, K-nearest neighbours; MAR, missing at random; MCAR, missing complete at random; MCMC, Markov chain Monte Carlo; Ml, multiple imputation; MLP, multilayer perceptrons; MNAR, missing not at random; MS, mean substitution; PCA. principal component analysis; PSO, particle swarm optimization; RBFN, radial basis function networks; SA, simulated annealing algorithm; SI, single Imputation; WKNN, weighted K-nearest neighbours; Zl, zero imputation
机译:不完整数据的处理是数据预处理中的重要步骤。我们提出了一种新颖的非参数算法,用于多重插补(GEMI)的广义回归神经网络集成。我们还开发了此方法GESI的单一估算(SI)版本。我们将我们的算法与98种真实数据集和综合数据集上25种流行的缺失数据插补算法进行了比较,得出各种百分比的缺失值。根据以下方面评估算法的有效性:(i)输出分类的准确性:对三种分类器(广义回归神经网络,多层感知器和逻辑回归技术)分别进行训练,并在每种插补算法插补的数据集上进行测试,(ii)缺少观察值的区间分析和(iii)缺失值估算的点估计准确性。在考虑的所有三个标准方面,GEMI优于GESI和所有常规插补算法。缩写:EM,期望最大化; GA,遗传算法:GRNN,广义回归神经网络; HD,热甲板插补; HUX,半均匀交叉; KNN,K近邻; MAR,随机失踪; MCAR,随机遗失完整; MCMC,马尔可夫链蒙特卡洛; Ml,多重插补; MLP,多层感知器; MNAR,不是随机丢失的; MS,均值取代; PCA。主成分分析PSO,粒子群优化; RBFN,径向基函数网络; SA,模拟退火算法; SI,单插补; WKNN,加权的K近邻; Zl,零归因

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号