首页> 外文期刊>Expert Systems with Application >Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model
【24h】

Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model

机译:使用基于灰色的新型模糊c均值,基于互信息的特征选择和回归模型进行缺失值插补

获取原文
获取原文并翻译 | 示例
       

摘要

The presence of missing values in real-world data is not only a prevalent problem but also an inevitable one. Therefore, missing values should be handled carefully before the mining or learning process. This paper proposes a novel technique to impute missing data. It employs a new version of Fuzzy c-Means clustering algorithm which benefits from advantages of Grey Relational Grade over Minkowski-like similarity measures. To impute a missing value more accurately, it also performs a local mutual information based feature selection in each cluster to select only highly relevant features. Briefly, missing values are imputed in the following steps. First, the algorithm finds the importance of each missing attribute. Next, input instances are separated into several fuzzy clusters. Then, the algorithm selects clusters which satisfy a minimum condition. After that, it chooses highly dependent features of instances within each cluster using a mutual information based feature selection approach. When the features are selected, regression models will be applied to the selected features of the selected clusters to provide estimations for a missing value. Finally, the missing value is imputed through a weighted average of estimated values obtained from the previous step.Three well-known evaluation criteria and the accuracy of classification task are used to assess the performance of the proposed method. The experimental results for seven UCI data sets with different missing ratios and strategies indicate that the proposed algorithm outperforms five other imputation methods in general. (C) 2018 Elsevier Ltd. All rights reserved.
机译:实际数据中缺失值的存在不仅是一个普遍存在的问题,而且是一个不可避免的问题。因此,在挖掘或学习过程之前,应谨慎处理缺失值。本文提出了一种估算缺失数据的新技术。它采用了新版本的Fuzzy c-Means聚类算法,该算法受益于灰色关联度优于Minkowski相似性度量的优势。为了更准确地估算缺失值,它还会在每个群集中执行基于本地互信息的特征选择,以仅选择高度相关的特征。简要地,在以下步骤中估算缺失值。首先,该算法找到每个缺失属性的重要性。接下来,将输入实例分为几个模糊聚类。然后,该算法选择满足最小条件的聚类。之后,它使用基于互信息的特征选择方法在每个集群中选择实例的高度相关特征。选择特征后,会将回归模型应用于选定聚类的选定特征,以提供缺失值的估计。最后,通过上一步获得的估计值的加权平均值来估算缺失值。使用三个众所周知的评估标准和分类任务的准确性来评估该方法的性能。针对具有不同丢失率和策略的七个UCI数据集的实验结果表明,所提出的算法总体上优于其他五种归因方法。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号