【24h】

A new graph feature selection approach

机译:一个新的图表特征选择方法

获取原文

摘要

Feature selection (FS) is a very important pre-processing technique in machine learning and data mining. It aims to select a small subset of relevant and informative features from the original feature space that may contain many irrelevant, redundant and noisy features. Feature selection usually leads to better performance, interpretability, and lower computational cost. In the literature, FS methods are categorized into three main approaches: Filters, Wrappers, and Embedded. In this paper we introduce a new feature selection method called graph feature selection (GFS). The main steps of GFS are the following: first, we create a weighted graph where each node corresponds to each feature and the weight between two nodes is computed using a matrix of individual and pairwise score of a Decision tree classifier. Second, at each iteration, we split the graph into two random partitions having the same number of nodes, then we keep moving the worst node from one partition to another until the global modularity is converged. Third, from the final best partition, we select the best ranked features according to a new proposed variable importance criterion. The results of GFS are compared to three well-known feature selection algorithms using nine benchmarking datasets. The proposed method shows its ability and effectiveness at identifying the most informative feature subset.
机译:特征选择(FS)是机器学习和数据挖掘中的一个非常重要的预处理技术。它旨在从原始特征空间中选择一个小型相关和信息特征的小组集,这些功能可能包含许多无关,冗余和嘈杂的功能。特征选择通常会导致更好的性能,可解释性和较低的计算成本。在文献中,FS方法分为三种主要方法:过滤器,包装器和嵌入式。在本文中,我们介绍了一种名为agapress选择(GFS)的新功能选择方法。 GFS的主要步骤如下:首先,我们创建一个加权图,其中每个节点对应于每个特征,并且使用决策树分类器的个体矩阵来计算两个节点之间的权重。其次,在每次迭代中,我们将图形拆分为具有相同数量的节点的两个随机分区,然后我们将最差节点从一个分区移动到另一个分区,直到全局模块融合到另一个分区。三,从最终的最佳分区,我们根据新的拟议变量重要性标准选择最佳排名功能。将GFS的结果与使用九个基准数据集的三个众所周知的特征选择算法进行比较。该方法在识别最佳信息特征子集时显示其能力和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号