首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Selecting feature subset for high dimensional data via the propositional FOIL rules
【24h】

Selecting feature subset for high dimensional data via the propositional FOIL rules

机译:通过命题FOIL规则为高维数据选择特征子集

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Feature interaction is an important issue in feature subset selection. However, most of the existing algorithms only focus on dealing with irrelevant and redundant features. In this paper, a propositional FOIL rule based algorithm FRFS, which not only retains relevant features and excludes irrelevant and redundant ones but also considers feature interaction, is proposed for selecting feature subset for high dimensional data. FRFS first merges the features appeared in the antecedents of all FOIL rules, achieving a candidate feature subset which excludes redundant features and reserves interactive ones. Then, it identifies and removes irrelevant features by evaluating features in the candidate feature subset with a new metric CoverRatio, and obtains the final feature subset. The efficiency and effectiveness of FRFS are extensively tested upon both synthetic and real world data sets, and it is compared with other six representative feature subset selection algorithms, including CFS, FCBF, Consistency, Relief-F, INTERACT, and the rule-based FSBAR, in terms of the number of selected features, runtime and the classification accuracies of the four well-known classifiers including Naive Bayes, C4.5, PART and IB1 before and after feature selection. The results on the five synthetic data sets show that FRFS can effectively identify irrelevant and redundant features while reserving interactive ones. The results on the 35 real world high dimensional data sets demonstrate that compared with other six feature selection algorithms, FRFS cannot only efficiently reduce the feature space, but also can significantly improve the performance of the four well-known classifiers.
机译:特征交互是特征子集选择中的重要问题。然而,大多数现有算法仅专注于处理不相关和冗余的特征。本文提出了一种基于命题FOIL规则的算法FRFS,该算法不仅保留相关特征,排除无关和冗余特征,而且考虑特征交互,为高维数据选择特征子集。 FRFS首先合并所有FOIL规则的先例中出现的特征,从而获得候选特征子集,该子集排除了冗余特征并保留了交互式特征。然后,它通过使用新的度量CoverRatio评估候选特征子集中的特征来识别并删除不相关的特征,并获得最终特征子集。 FRFS的效率和有效性已在综合和真实数据集上进行了广泛测试,并与其他六种代表性特征子集选择算法进行了比较,包括CFS,FCBF,Consistency,Relief-F,INTERACT和基于规则的FSBAR ,就所选特征的数量,特征选择前后的四个著名分类器(包括朴素贝叶斯,C4.5,PART和IB1)的运行时间和分类准确性而言。在五个综合数据集上的结果表明,FRFS可以有效地识别不相关和冗余的特征,同时保留交互式特征。在35个现实世界的高维数据集上的结果表明,与其他六个特征选择算法相比,FRFS不仅可以有效地减少特征空间,而且可以显着提高四个著名分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号