...
首页> 外文期刊>Expert Systems with Application >Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria
【24h】

Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria

机译:根据评估标准全面研究解决多重共线性问题的特征选择方法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper provides a new approach to feature selection based on the concept of feature filters, so that feature selection is independent of the prediction model. Data fitting is stated as a single-objective optimization problem, where the objective function indicates the error of approximating the target vector as some function of given features. Linear dependence between features induces the multicollinearity problem and leads to instability of the model and redundancy of the feature set. This paper introduces a feature selection method based on quadratic programming. This approach takes into account the mutual dependence of the features and the target vector, and selects features according to relevance and similarity measures defined according to the specific problem. The main idea is to minimize mutual dependence and maximize approximation quality by varying a binary vector that indicates the presence of features. The selected model is less redundant and more stable. To evaluate the quality of the proposed feature selection method and compare it with others, we use several criteria to measure instability and redundancy. In our experiments, we compare the proposed approach with several other feature selection methods, and show that the quadratic programming approach gives superior results according to the criteria considered for the test and real data sets. (C) 2017 Elsevier Ltd. All rights reserved.
机译:本文基于特征过滤器的概念,提供了一种新的特征选择方法,使特征选择与预测模型无关。数据拟合被描述为一个单目标优化问题,其中目标函数表示将目标向量近似为给定特征的某些函数的误差。特征之间的线性相关性引发了多重共线性问题,并导致模型的不稳定和特征集的冗余。介绍了一种基于二次规划的特征选择方法。该方法考虑了特征和目标向量的相互依赖性,并根据根据特定问题定义的相关性和相似性度量来选择特征。主要思想是通过改变指示特征存在的二进制矢量来最小化相互依存关系并最大化近似质量。所选模型的冗余度较低,并且更稳定。为了评估所提出的特征选择方法的质量并将其与其他方法进行比较,我们使用一些标准来衡量不稳定性和冗余性。在我们的实验中,我们将提出的方法与其他几种特征选择方法进行了比较,结果表明,根据考虑测试和实际数据集的标准,二次编程方法可提供更好的结果。 (C)2017 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号