首页> 外文会议>IEEE Congress on Evolutionary Computation >Impact of imputation of missing values on genetic programming based multiple feature construction for classification
【24h】

Impact of imputation of missing values on genetic programming based multiple feature construction for classification

机译:缺失值的归因对基于遗传规划的多特征构造分类的影响

获取原文

摘要

Missing values are a common problem in many real world databases. A common way to cope with this problem is to use imputation methods to fill missing values with plausible values. Genetic programming-based multiple feature construction (GPMFC) is a filter approach to multiple feature construction for classifiers using Genetic programming. The GPMFC algorithm has been demonstrated to improve classification performance in decision tree and rule-based classifiers for complete data, but it has not been tested on imputed data. This paper studies the effect of GPMFC on classification accuracy with imputed data and how the choice of different imputation methods (mean imputation, hot deck imputation, Knn imputation, EM imputation and MICE imputation) affects classifiers using constructed features. Results show that GPMFC improves classification performance for datasets with a small amount of missing values. The combination of GPMFC and MICE imputation, in most cases, enhances classification performance for datasets with varying amounts of missing values and obtains the best classification accuracy.
机译:缺少值是许多现实世界数据库中的常见问题。解决此问题的常用方法是使用插补方法用合理的值填充缺失值。基于遗传程序的多特征构造(GPMFC)是使用遗传程序设计的分类器多特征构造的一种过滤方法。 GPMFC算法已被证明可以改善决策树和基于规则的分类器中完整数据的分类性能,但尚未在估算数据上进行过测试。本文研究了使用插补数据的GPMFC对分类准确性的影响,以及不同插补方法(平均插补,热甲板插补,Knn插补,EM插补和MICE插补)的选择如何影响使用构造特征的分类器。结果表明,GPMFC改进了具有少量缺失值的数据集的分类性能。在大多数情况下,结合使用GPMFC和MICE插补,可以提高丢失值量有所变化的数据集的分类性能,并获得最佳的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号