...
首页> 外文期刊>Knowledge-Based Systems >Imputation of missing data with neural networks for classification
【24h】

Imputation of missing data with neural networks for classification

机译:使用神经网络对缺失数据进行插补以进行分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We propose a mechanism to use data with missing values for designing classifiers which is different from predicting missing values for classification. Our imputation method uses an auto-encoder neural network. We make an innovative use of the training data without missing values to train the autoen-coder so that it is better equipped to predict missing values. It is a two-stage training scheme. Unlike most of the existing auto-encoder based methods which use a bottleneck layer for missing data handling, we justify and use a latent space of much higher dimension than that of the input. Now to design a classifier using a training set with missing values, we use the trained auto-encoder to predict missing values based on the hypothesis that a good choice for a missing value would be the one which can reconstruct itself via the auto-encoder. For this we make an initial guess of the missing value using the nearest neighbor rule and then refine the missing value minimizing the reconstruction error. We train several classifiers using the union of the imputed instances and the remaining training instances without missing values. We also train another classifier of the same type with the same configuration using the corresponding complete dataset. The performances of these classifiers are compared. We compare the proposed method with eight state-of-the-art imputation techniques using fourteen datasets and eight classification strategies. (C) 2019 Elsevier B.V. All rights reserved.
机译:我们提出了一种机制,该机制使用具有缺失值的数据来设计分类器,这与预测用于分类的缺失值不同。我们的插补方法使用自动编码器神经网络。我们创新地使用了训练数据而不会丢失值来训练自动编码器,以便更好地预测丢失的值。这是一个两阶段的培训计划。与大多数现有的基于自动编码器的方法(它们使用瓶颈层来缺少数据处理)不同,我们证明并使用比输入大得多的潜在空间。现在,要使用带有缺失值的训练集来设计分类器,我们使用经过训练的自动编码器来预测缺失值,该假设基于一个缺失值的好选择是可以通过自动编码器自身进行重构的假设。为此,我们使用最近的邻居规则对缺失值进行初始猜测,然后优化缺失值,以最大程度地减少重构误差。我们使用推算实例和其余训练实例的并集来训练多个分类器,而不会遗漏任何值。我们还将使用相应的完整数据集训练具有相同配置的相同类型的另一个分类器。比较这些分类器的性能。我们使用14种数据集和8种分类策略将提出的方法与8种最新的归因技术进行了比较。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号