Selecting feature subset for high dimensional data via the propositional FOIL rules

Wang G.; Song Q.; Xu B.; Zhou Y.

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Selecting feature subset for high dimensional data via the propositional FOIL rules

【24h】

Selecting feature subset for high dimensional data via the propositional FOIL rules

机译：通过命题FOIL规则为高维数据选择特征子集

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Feature interaction is an important issue in feature subset selection. However, most of the existing algorithms only focus on dealing with irrelevant and redundant features. In this paper, a propositional FOIL rule based algorithm FRFS, which not only retains relevant features and excludes irrelevant and redundant ones but also considers feature interaction, is proposed for selecting feature subset for high dimensional data. FRFS first merges the features appeared in the antecedents of all FOIL rules, achieving a candidate feature subset which excludes redundant features and reserves interactive ones. Then, it identifies and removes irrelevant features by evaluating features in the candidate feature subset with a new metric CoverRatio, and obtains the final feature subset. The efficiency and effectiveness of FRFS are extensively tested upon both synthetic and real world data sets, and it is compared with other six representative feature subset selection algorithms, including CFS, FCBF, Consistency, Relief-F, INTERACT, and the rule-based FSBAR, in terms of the number of selected features, runtime and the classification accuracies of the four well-known classifiers including Naive Bayes, C4.5, PART and IB1 before and after feature selection. The results on the five synthetic data sets show that FRFS can effectively identify irrelevant and redundant features while reserving interactive ones. The results on the 35 real world high dimensional data sets demonstrate that compared with other six feature selection algorithms, FRFS cannot only efficiently reduce the feature space, but also can significantly improve the performance of the four well-known classifiers.

机译：特征交互是特征子集选择中的重要问题。然而，大多数现有算法仅专注于处理不相关和冗余的特征。本文提出了一种基于命题FOIL规则的算法FRFS，该算法不仅保留相关特征，排除无关和冗余特征，而且考虑特征交互，为高维数据选择特征子集。 FRFS首先合并所有FOIL规则的先例中出现的特征，从而获得候选特征子集，该子集排除了冗余特征并保留了交互式特征。然后，它通过使用新的度量CoverRatio评估候选特征子集中的特征来识别并删除不相关的特征，并获得最终特征子集。 FRFS的效率和有效性已在综合和真实数据集上进行了广泛测试，并与其他六种代表性特征子集选择算法进行了比较，包括CFS，FCBF，Consistency，Relief-F，INTERACT和基于规则的FSBAR ，就所选特征的数量，特征选择前后的四个著名分类器（包括朴素贝叶斯，C4.5，PART和IB1）的运行时间和分类准确性而言。在五个综合数据集上的结果表明，FRFS可以有效地识别不相关和冗余的特征，同时保留交互式特征。在35个现实世界的高维数据集上的结果表明，与其他六个特征选择算法相比，FRFS不仅可以有效地减少特征空间，而且可以显着提高四个著名分类器的性能。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2013年第1期|共16页
作者
Wang G.; Song Q.; Xu B.; Zhou Y.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Feature interaction; Feature subset selection; Filter method; Propositional FOIL rule;

机译：特征交互;特征子集选择;过滤方法;命题FOIL规则;

相似文献

外文文献
中文文献
专利

1. Selecting feature subset for high dimensional data via the propositional FOIL rules [J] . Wang G., Song Q., Xu B., Pattern Recognition: The Journal of the Pattern Recognition Society . 2013,第1期

机译：通过命题FOIL规则为高维数据选择特征子集
2. Parallel Frequent Dataset Mining and Feature Subset Selection for High Dimensional Data on Hadoop using Map-Reduce [J] . Sandhya S. Waghere, Pothuraju Rajarajeswari International Journal of Applied Engineering Research . 2017,第18aPta5期

机译：使用Map-Refey对Hadoop上的高维数据的并行频繁数据集挖掘和功能子集选择
3. IoT based Smart Farming : Feature subset selection for optimized high dimensional data using improved GA based approach for ELM [J] . Kale Archana P., Sonavane Shefali P. Computers and Electronics in Agriculture . 2019,第期

机译：基于IOT的智能农场：使用改进的基于GA的ELM方法进行优化的高维数据的特征子集选择
4. Feature Selection on High Dimensional Data using Wrapper Based Subset Selection [C] . Manikandan G., Susi E., Abirami S. International Conference on Recent Trends and Challenges in Computational Models . 2017

机译：使用基于包装器的子集选择的高维数据的功能选择
5. Classification and variable selection for high dimensional multivariate binary data: Adaboost based new methods and a theory for the plug-in rule. [D] . Park, Junyong. 2006

机译：高维多元二进制数据的分类和变量选择：基于Adaboost的新方法和插件规则的理论。
6. An Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset [O] . Senthilkumar Devaraj, S. Paulraj 2015

机译：多维数据集分类的有效特征子集选择算法
7. Hybrid approaches to feature subset selection for data classification in high-dimensional feature space [O] . Maysa Ibrahem Almulla Khalaf, John Q Gan 2020

机译：HybrId方法来具有用于高维特征空间中数据分类的子集选择

Selecting feature subset for high dimensional data via the propositional FOIL rules

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅