Experimental analysis of methods for imputation of missing values in databases

机译：对数据库中缺失值的估算方法的实验分析

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

A very important issue faced by researchers and practitioners who use industrial and research databases is incompleteness of data, usually in terms of missing or erroneous values. While some of data analysis algorithms can work with incomplete data, a large portion of them require complete data. Therefore, different strategies, such as deletion of incomplete examples, and imputation (filling) of missing values through variety of statistical and machine learning (ML) procedures, are developed to preprocess the incomplete data. This study concentrates on performing experimental analysis of several algorithms for imputation of missing values, which range from simple statistical algorithms like mean and hot deck imputation to imputation algorithms that work based on application of inductive ML algorithms. Three major families of ML algorithms, such as probabilistic algorithms (e.g. Naieve Bayes), decision tree algorithms (e.g. C4.5), and decision rule algorithms (e.g. CLIP4), are used to implement the ML based imputation algorithms. The analysis is carried out using a comprehensive range of databases, for which missing values were introduced randomly. The goal of this paper is to provide general guidelines on selection of suitable data imputation algorithms based on characteristics of the data. The guidelines are developed by performing a comprehensive experimental comparison of performance of different data imputation algorithms.

机译：使用工业和研究数据库的研究人员和从业人员面临的一个非常重要的问题是数据的不完整，通常是缺失或错误的值。尽管某些数据分析算法可以处理不完整的数据，但其中很大一部分需要完整的数据。因此，开发了各种策略，例如通过各种统计和机器学习（ML）程序来删除不完整的示例以及对缺失值进行插补（填充），以预处理不完整的数据。这项研究的重点是对几种用于估算缺失值的算法进行实验分析，范围从简单的统计算法（例如均值和热甲板估算）到基于归纳ML算法应用的估算算法。 ML算法的三个主要系列（例如概率算法（例如Naieve Bayes），决策树算法（例如C4.5）和决策规则算法（例如CLIP4））用于实现基于ML的插补算法。使用范围广泛的数据库进行分析，针对这些数据库随机引入缺失值。本文的目的是为根据数据特征选择合适的数据插补算法提供一般指导。该指南是通过对不同数据插补算法的性能进行全面的实验比较而制定的。

著录项

来源
《Conference on Intelligent Computing: Theory and Applications II; 20040412-20040413; Orlando,FL; US》|2004年|P.172-182|共11页
会议地点 Orlando FL(US)
作者
Alireza Farhangfar; Lukasz Kurgan; Witold Pedrycz;
展开▼
作者单位

Electrical and Computer Engineering Department University of Alberta, Edmonton, AB, Canada, T6G 2V4;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化系统理论;
关键词
incompleteness; missing values; imputation; preprocessing; machine learning;

机译：不完整;缺失值;输入量;预处理;机器学习;

相似文献

外文文献
中文文献
专利

1. A New Imputation Algorithm Based Approach for Missing Attribute Values in Databases: An Experimental Approach [J] . Madhu G International Journal of Artificial Intelligence and Knowledge Discovery . 2013,第4期

机译：一种基于归因算法的数据库缺失属性值的新方法：一种实验方法
2. Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. [J] . van der Heijden GJ, Donders AR, Stijnen T, Journal of Clinical Epidemiology . 2006,第10期

机译：在多变量诊断研究中，缺失值的估算优于完整的病例分析和缺失指标方法：一个临床实例。
3. An Analysis on K-Means Algorithm as an Imputation Method to Deal with Missing Values [J] . B. Mehala, K. Vivekanandan, P. Ranjit Jeba Thangaiah Asian Journal of Information Technology . 2008,第9期

机译：K-Means算法作为处理缺失值的插补方法的分析
4. Experimental analysis of methods for imputation of missing values in databases [C] . Alireza Farhangfar, Lukasz Kurgan, Witold Pedrycz, Society of Photo-Optical Instrumentation Engineers Conference on Intelligent Computing : Theory and Applications . 2004

机译：数据库中缺失值归咎的方法实验分析
5. Methodological and clinical issues in analysis of data from HIV cardiovascular research: Validity of ultrasound methods, impact of anti-retroviral therapy on atherosclerosis, and imputation of missing values. [D] . Odueyungbo, Adefowope. 2010

机译：HIV心血管研究数据分析中的方法学和临床问题：超声方法的有效性，抗逆转录病毒疗法对动脉粥样硬化的影响以及缺失值的归因。
6. The ability of different imputation methods for missing values in mental measurement questionnaires [O] . Xueying Xu, Leizhen Xia, Qimeng Zhang, 2020

机译：心理测量问卷中不同插补方法对缺失值的处理能力
7. Comparative Analysis of Different Imputation Methods to Treat Missing Values in Data Mining Environment [O] . Rahul Singhai, Iips Devi Ahilya 2014

机译：数据挖掘环境中不同插值方法处理缺失值的比较分析

Experimental analysis of methods for imputation of missing values in databases

摘要

著录项

相似文献

相关主题

期刊订阅