A Genetic Programming-Based Imputation Method for Classification with Missing Data

机译：基于遗传规划的缺失数据归类方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many industrial and real-world datasets suffer from an unavoidable problem of missing values. The ability to deal with missing values is an essential requirement for classification because inadequate treatment of missing values may lead to large errors on classification. The problem of missing data has been addressed extensively in the statistics literature, and also, but to a lesser extent in the classification literature. One of the most popular approaches to deal with missing data is to use imputation methods to fill missing values with plausible values. Some powerful imputation methods such as regression-based imputations in MICE [36] are often suitable for batch imputation tasks. However, they are often expensive to impute missing values for every single incomplete instance in the unseen set for classification. This paper proposes a genetic programming-based imputation (GPI) method for classification with missing data that uses genetic programming as a regression method to impute missing values. The experiments on six benchmark datasets and five popular classifiers compare GPI with five other popular and advanced regression-based imputation methods in MICE on two mear sures: classification accuracy and computation time. The results showed that, in most cases, GPI achieves classification accuracy at least as good as the other imputation methods, and sometimes significantly better. However, using GPI to impute missing values for every single incomplete instance is dramatically faster than the other imputation methods.

机译：许多工业和现实世界的数据集都不可避免地存在缺失值的问题。处理缺失值的能力是分类的基本要求，因为对缺失值的不充分处理可能会导致分类上的大错误。数据丢失的问题已在统计文献中得到了广泛解决，分类文献中也得到了较小程度的解决。处理缺失数据的最流行方法之一是使用插补方法用合理的值填充缺失值。一些强大的插补方法，例如MICE中基于回归的插补[36]，通常适合批量插补任务。但是，为看不见的分类集中的每个不完整实例估算缺失值通常很昂贵。本文提出了一种基于遗传编程的归因（GPI）方法，用于对缺失数据进行分类，该方法使用遗传编程作为回归方法来估算缺失值。在六个基准数据集和五个流行分类器上进行的实验在两个方面确保了GPI与MICE在MICE中的其他五个流行和高级基于回归的插补方法的比较：分类准确性和计算时间。结果表明，在大多数情况下，GPI的分类精度至少与其他插补方法一样好，有时甚至要好得多。但是，使用GPI为每个不完整实例插补缺失值比其他插补方法要快得多。

著录项

来源
《European conference on genetic programming》|2016年|149-163|共15页
会议地点
作者
Cao Truong Tran; Mengjie Zhang; Peter Andreae;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Missing data; Imputation methods; Genentic programming; Symbolic regression; Classification;

机译：缺失数据;插补方法;基因编程;符号回归;分类;

相似文献

外文文献
中文文献
专利

1. A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction [J] . Zhiyong Hu, Dongping Du PLoS One . 2020,第9期

机译：一种新的分析框架，用于缺少数据避难和不确定性的分类：缺少数据归档和心力衰竭入读预测
2. Comparative Performance of Imputation Methods for Different Proportions of Missing Data in Classification of Crop Genotypes [J] . Samarendra Das, Amrit Kumar Paul, S.D. Wahi, Journal of the Indian Society of Agricultural Statistics . 2017,第2期

机译：作物基因型分类中缺失数据不同比例的估算方法的比较绩效
3. Impact of missing data imputation methods on gene expression clustering and classification [J] . Marcilio CP de Souto, Pablo A Jaskowiak, Ivan G Costa BMC Bioinformatics . 2015,第1期

机译：缺失数据插补方法对基因表达聚类和分类的影响
4. A Genetic Programming-Based Imputation Method for Classification with Missing Data [C] . Cao Truong Tran, Mengjie Zhang, Peter Andreae European Conference on Genetic Programming . 2016

机译：基于遗传编程的缺货方法，用于分类缺失数据
5. Evaluating Multiple Imputation Methods for Longitudinal Healthy Aging Index—A Score Variable with Data Missing Due to Death, Dropout and Several Missing Data Mechanisms [D] . Kane, Elizabeth L. 2017

机译：纵向健康老龄化指数的多种估算方法的评估-一个因死亡，辍学和几种缺失数据机制导致数据缺失的得分变量
6. A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction [O] . Zhiyong Hu, Dongping Du 2020

机译：一种新的分析框架用于缺少数据避难和不确定性分类：缺少数据归档和心力衰竭入读预测
7. A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction [O] . Zhiyong Hu, Dongping Du 2020

机译：一种新的分析框架，用于缺少数据避难和不确定性分类：缺少数据归档和心力衰竭入读预测

A Genetic Programming-Based Imputation Method for Classification with Missing Data

摘要

著录项

相似文献

相关主题

期刊订阅