Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets

Sim Jaemun; Kwon Ohbyung; Lee Kun Chang

首页> 外文期刊>Expert Systems with Application >Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets

【24h】

Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets

机译：基于数据集中缺失值特征的分类器和归类方法的自适应配对

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Classifiers and imputation methods have played crucial parts in the field of big data analytics. Especially, when using data sets characterized by horizontal scattering, vertical scattering, level of spread, compound metric, imbalance ratio and missing ratio, how to combine those classifiers and imputation methods will lead to significantly different performance. Therefore, it is essential that the characteristics of data sets must be identified in advance to facilitate selection of the optimal combination of imputation methods and classifiers. However, this is a very costly process. The purpose of this paper is to propose a novel method of automatic, adaptive selection of the optimal combination of classifier and imputation method on the basis of features of a given data set. The proposed method turned out to successfully demonstrate the superiority in performance evaluations with multiple data sets. The decision makers in big data analytics could greatly benefit from the proposed method when it comes to dealing with data set in which the distribution of missing data varies in real time. (C) 2015 Elsevier Ltd. All rights reserved.

机译：分类器和插补方法在大数据分析领域起着至关重要的作用。特别是，当使用以水平散射，垂直散射，扩展程度，复合度量，不平衡率和缺失率为特征的数据集时，如何结合使用这些分类器和插补方法将导致明显不同的性能。因此，至关重要的是必须事先确定数据集的特征，以利于选择插补方法和分类器的最佳组合。但是，这是一个非常昂贵的过程。本文的目的是根据给定数据集的特征，提出一种自动，自适应地选择分类器和归类方法的最佳组合的新方法。事实证明，所提出的方法成功地证明了在具有多个数据集的性能评估中的优越性。大数据分析中的决策者可以从所提出的方法中受益，该方法涉及处理丢失数据的分布实时变化的数据集。（C）2015 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2016年第3期|485-493|共9页
作者
Sim Jaemun; Kwon Ohbyung; Lee Kun Chang;
展开▼
作者单位

Sungkyunkwan Univ, SKKU Business Sch, Seoul 110745, South Korea;

Kyung Hee Univ, Sch Management, Seoul 130701, South Korea;

Sungkyunkwan Univ, SKKU Business Sch, Seoul 110745, South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Classification algorithms; Imputation methods; Case-based reasoning; Experiments;

机译：分类算法;计算方法;基于案例的推理;实验;

相似文献

外文文献
中文文献
专利

1. Multiple Imputation of Missing Values Using the Response Function Method Based on a Data Set of the Health Assessment Questionnaire Disability Index [J] . Beyza DO?ANAY ERDO?AN, Atilla H. ELHAN, Hakan DEM?RTA?, Archives of rheumatology. . 2013,第1期

机译：基于健康评估问卷残疾指数数据集的响应函数方法对缺失值进行多次插补
2. Methods for imputation of missing values in air quality data sets [J] . Heikki Junninen, Harri Niska, Kari Tuppurainen, Atmospheric environment . 2004,第18期

机译：估算空气质量数据集中缺失值的方法
3. Single imputation method of missing values in environmental pollution data sets [J] . Plaia A, Bondi AL Atmospheric environment . 2006,第38期

机译：环境污染数据集中缺失值的单一估算方法
4. A Comparative Study of Imputation Methods to Predict Missing Attribute Values in Coronary Heart Disease Data Set [C] . N.A. Setiawan, P.A. Venkatachalam, A.F.M. Hani International Conference on Biomedical Engineering . 2008

机译：预测冠心病数据集中缺失属性值的撤销方法的比较研究
5. Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values [D] . Cao, Jian. 2018

机译：具有缺失或抑制值的大型多尺度数据集的多重估算方法
6. A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits [O] . MinJae Lee, Mohammad H. Rahbar, Matthew Brown, 2018

机译：基于加权分位数回归模型的多重插补方法用于在早期就诊时缺失值的纵向检查生物标记数据
7. The impact of imputation procedures with machine learning methods on the performance of classifiers: An application to coronary artery disease data including missing values [O] . Jale Bektas, Turgay Ibrikci, Ismail Turkay Ozcan 2018

机译：用机器学习方法对分类器性能的估算方法的影响：冠状动脉疾病数据的应用，包括缺失值

Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅