首页> 外文会议>European conference on genetic programming >A Genetic Programming-Based Imputation Method for Classification with Missing Data
【24h】

A Genetic Programming-Based Imputation Method for Classification with Missing Data

机译:基于遗传规划的缺失数据归类方法

获取原文

摘要

Many industrial and real-world datasets suffer from an unavoidable problem of missing values. The ability to deal with missing values is an essential requirement for classification because inadequate treatment of missing values may lead to large errors on classification. The problem of missing data has been addressed extensively in the statistics literature, and also, but to a lesser extent in the classification literature. One of the most popular approaches to deal with missing data is to use imputation methods to fill missing values with plausible values. Some powerful imputation methods such as regression-based imputations in MICE [36] are often suitable for batch imputation tasks. However, they are often expensive to impute missing values for every single incomplete instance in the unseen set for classification. This paper proposes a genetic programming-based imputation (GPI) method for classification with missing data that uses genetic programming as a regression method to impute missing values. The experiments on six benchmark datasets and five popular classifiers compare GPI with five other popular and advanced regression-based imputation methods in MICE on two mear sures: classification accuracy and computation time. The results showed that, in most cases, GPI achieves classification accuracy at least as good as the other imputation methods, and sometimes significantly better. However, using GPI to impute missing values for every single incomplete instance is dramatically faster than the other imputation methods.
机译:许多工业和现实世界的数据集都不可避免地存在缺失值的问题。处理缺失值的能力是分类的基本要求,因为对缺失值的不充分处理可能会导致分类上的大错误。数据丢失的问题已在统计文献中得到了广泛解决,分类文献中也得到了较小程度的解决。处理缺失数据的最流行方法之一是使用插补方法用合理的值填充缺失值。一些强大的插补方法,例如MICE中基于回归的插补[36],通常适合批量插补任务。但是,为看不见的分类集中的每个不完整实例估算缺失值通常很昂贵。本文提出了一种基于遗传编程的归因(GPI)方法,用于对缺失数据进行分类,该方法使用遗传编程作为回归方法来估算缺失值。在六个基准数据集和五个流行分类器上进行的实验在两个方面确保了GPI与MICE在MICE中的其他五个流行和高级基于回归的插补方法的比较:分类准确性和计算时间。结果表明,在大多数情况下,GPI的分类精度至少与其他插补方法一样好,有时甚至要好得多。但是,使用GPI为每个不完整实例插补缺失值比其他插补方法要快得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号