首页> 外文会议>European Conference on Genetic Programming >A Genetic Programming-Based Imputation Method for Classification with Missing Data
【24h】

A Genetic Programming-Based Imputation Method for Classification with Missing Data

机译:基于遗传编程的缺货方法,用于分类缺失数据

获取原文

摘要

Many industrial and real-world datasets suffer from an unavoidable problem of missing values. The ability to deal with missing values is an essential requirement for classification because inadequate treatment of missing values may lead to large errors on classification. The problem of missing data has been addressed extensively in the statistics literature, and also, but to a lesser extent in the classification literature. One of the most popular approaches to deal with missing data is to use imputation methods to fill missing values with plausible values. Some powerful imputation methods such as regression-based imputations in MICE [36] are often suitable for batch imputation tasks. However, they are often expensive to impute missing values for every single incomplete instance in the unseen set for classification. This paper proposes a genetic programming-based imputation (GPI) method for classification with missing data that uses genetic programming as a regression method to impute missing values. The experiments on six benchmark datasets and five popular classifiers compare GPI with five other popular and advanced regression-based imputation methods in MICE on two measures: classification accuracy and computation time. The results showed that, in most cases, GPI achieves classification accuracy at least as good as the other imputation methods, and sometimes significantly better. However, using GPI to impute missing values for every single incomplete instance is dramatically faster than the other imputation methods.
机译:许多工业和现实世界数据集遭受了缺失价值的不可避免的问题。处理缺失值的能力是对分类的基本要求,因为缺失值的治疗不足可能导致分类的大错误。缺失数据的问题已经在统计文献中广泛解决,但在分类文献中的程度较小。处理缺失数据的最受欢迎方法之一是使用撤销方法来填充具有合理值的缺失的值。一些强大的估算方法,如小鼠[36]的基于回归的避难所,通常适用于批量归档任务。但是,它们通常是昂贵的,以避免缺失的缺失值,以便在寻常设置中的每个单个不完整实例进行分类。本文提出了一种基于遗传编程的归纳(GPI)方法,用于分类,用于使用遗传编程作为回归方法来分类,以赋予缺失值。六个基准数据集和五个流行分类器的实验比较GPI在两种措施中与小鼠中的五种其他流行和高级回归的估算方法进行比较:分类准确性和计算时间。结果表明,在大多数情况下,GPI至少可以与其他估算方法一样好,有时明显更好地实现分类准确性。但是,使用GPI为每个单个不完整实例施加缺失的值,比其他估算方法剧烈地迅速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号