Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model

Sefidian Amir Masoud; Daneshpour Negin

首页> 外文期刊>Expert Systems with Application >Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model

【24h】

Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model

机译：使用基于灰色的新型模糊c均值，基于互信息的特征选择和回归模型进行缺失值插补

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step.Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general. (C) 2018 Elsevier Ltd. All rights reserved.

机译：实际数据中缺失值的存在不仅是一个普遍存在的问题，而且是一个不可避免的问题。因此，在挖掘或学习过程之前，应谨慎处理缺失值。本文提出了一种估算缺失数据的新技术。它采用了新版本的Fuzzy c-Means聚类算法，该算法受益于灰色关联度优于Minkowski相似性度量的优势。为了更准确地估算缺失值，它还会在每个群集中执行基于本地互信息的特征选择，以仅选择高度相关的特征。简要地，在以下步骤中估算缺失值。首先，该算法找到每个缺失属性的重要性。接下来，将输入实例分为几个模糊聚类。然后，该算法选择满足最小条件的聚类。之后，它使用基于互信息的特征选择方法在每个集群中选择实例的高度相关特征。选择特征后，会将回归模型应用于选定聚类的选定特征，以提供缺失值的估计。最后，通过上一步获得的估计值的加权平均值来估算缺失值。使用三个众所周知的评估标准和分类任务的准确性来评估该方法的性能。针对具有不同丢失率和策略的七个UCI数据集的实验结果表明，所提出的算法总体上优于其他五种归因方法。（C）2018 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2019年第1期|68-94|共27页
作者
Sefidian Amir Masoud; Daneshpour Negin;
展开▼
作者单位

Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Missing data imputation; Grey relational analysis; Fuzzy c-means; Mutual information; Regression;

机译：缺失数据归因;灰色关联分析;模糊c均值;相互信息;回归;

相似文献

外文文献
中文文献
专利

1. Unsupervised hyperspectral feature selection based on fuzzy c-means and grey wolf optimizer [J] . Xie Fuding, Lei Cunkuan, Li Fangfei, International journal of remote sensing . 2019,第9a10期

机译：基于模糊c均值和灰太狼优化器的无监督高光谱特征选择
2. Unsupervised hyperspectral feature selection based on fuzzy c-means and grey wolf optimizer [J] . Xie Fuding, Lei Cunkuan, Li Fangfei, International journal of remote sensing . 2019,第9a10期

机译：基于模糊C型方式和灰狼优化器的无监督高光谱特征选择
3. A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation [J] . Jinjun Tang, Guohui Zhang, Yinhai Wang, Transportation research . 2015,第feba期

机译：一种基于模糊C均值的插补方法与遗传算法相结合的混合方法
4. Missing Values Imputation based on Fuzzy C-Means Algorithm for Classification of Chronic Obstructive Pulmonary Disease (COPD) [C] . Kiki Aristiawati, Titin Siswantining, Devvi Sarwinda, International Conference on Mathmatics and Its Applications . 2019

机译：基于模糊C型算法的慢性阻塞性肺病分类（COPD）缺失值缺失
5. Feature Selection, Flaring Size and Time-to-Flare Prediction Using Support Vector Regression, and Automated Prediction of Flaring Behavior Based on Spatio-Temporal Measures Using Hidden Markov Models [D] . Al-Ghraibah, Amani 2015

机译：支持向量回归的特征选择，火炬大小和火炬发射时间预测，以及基于时空测度的隐马尔可夫模型自动预测火炬行为
6. Evaluating model based imputation methods for missing covariates in regression models with interactions [O] . Soeun Kim, Catherine A. Sugar, Thomas R. Belin -1

机译：评估具有交互作用的回归模型中缺少协变量的基于模型的插补方法
7. Normed kernel function-based fuzzy possibilistic C-means (NKFPCM) algorithm for high-dimensional breast cancer database classification with feature selection is based on Laplacian Score [O] . A. W. Lestari, Z. Rustam 2017

机译：基于内核功能的模糊可能性C-means（NKFPCM）具有特征选择的高维乳腺癌数据库分类算法基于Laplacian分数

Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model

摘要

著录项

相似文献

相关主题

期刊订阅