首页> 外文会议>International Conference on Soft Computing Models in Industrial and Environmental Applications >Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy
【24h】

Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy

机译:监督专题加权归一化方法的分析与应用,提高k型准确性

获取原文

摘要

Normalization methods are widely employed for transforming the variables or features of a given dataset. In this paper three classical feature normalization methods, Standardization (St), Min-Max (MM) and Median Absolute Deviation (MAD), are studied in different synthetic datasets from UCI repository. An exhaustive analysis of the transformed features' ranges and their influence on the Euclidean distance is performed, concluding that knowledge about the group structure gathered by each feature is needed to select the best normalization method for a given dataset. In order to effectively collect the features' importance and adjust their contribution, this paper proposes a two-stage methodology for normalization and supervised feature weighting based on a Pearson correlation coefficient and on a Random Forest Feature Importance estimation method. Simulations on five different datasets reveal that our two-stage proposed methodology, in terms of accuracy, outperforms or at least maintains the K-means performance obtained if only normalization is applied.
机译:归一化方法广泛用于转换给定数据集的变量或特征。在本文中,三种经典特征归一化方法,标准化(ST),MIN-MAX(MM)和中位绝对偏差(MM),在UCI存储库的不同合成数据集中研究。对转换特征的范围的详尽分析及其对欧几里德距离的影响,得出结论是需要了解由每个特征收集的组结构的知识来选择给定数据集的最佳标准化方法。为了有效地收集“重要性”的重要性和调整贡献,本文提出了一种基于Pearson相关系数和随机林特征重要性估计方法的标准化和监督特征加权的两级方法。在五个不同的数据集上模拟显示,如果仅应用归一化,我们的两级提出方法在准确性,绩效效果或至少保持k均值的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号