Including Multi-feature Interactions and Redundancy for Feature Ranking in Mixed Datasets

机译：在混合数据集中包括多功能交互和冗余功能排名

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Feature ranking is beneficial to gain knowledge and to identify the relevant features from a high-dimensional dataset. However, in several datasets, few features by itself might have small correlation with the target classes, but by combining these features with some other features, they can be strongly correlated with the target. This means that multiple features exhibit interactions among themselves. It is necessary to rank the features based on these interactions for better analysis and classifier performance. However, evaluating these interactions on large datasets is computationally challenging. Furthermore, datasets often have features with redundant information. Using such redundant features hinders both efficiency and generalization capability of the classifier. The major challenge is to efficiently rank the features based on relevance and redundance on mixed datasets. In this work, we propose a filter-based framework based on Relevance and Redundancy (RaR), RaR computes a single score that quantifies the feature relevance by considering interactions between features and redundancy. The top ranked features of RaR are characterized by maximum relevance and non-redundance. The evaluation on synthetic and real world datasets demonstrates that our approach outperforms several state-of-the-art feature selection techniques. Code and data related to this chapter are available at: https://doi.org/ 10.6084/m9.figshare.5418706.

机译：特征排名有利于获取知识并从高维数据集中识别相关特征。但是，在几个数据集中，很少有特征本身可能与目标类别有较小的相关性，但是通过将这些特征与其他一些特征组合，可以将它们与目标高度相关。这意味着多个功能在它们之间表现出相互作用。为了更好的分析和分类器性能，有必要基于这些交互对特征进行排名。但是，在大型数据集上评估这些交互在计算上具有挑战性。此外，数据集通常具有带有冗余信息的特征。使用这样的冗余特征会阻碍分类器的效率和泛化能力。主要挑战是基于混合数据集的相关性和冗余度对特征进行有效排序。在这项工作中，我们提出了一个基于相关性和冗余度（RaR）的基于过滤器的框架，RaR通过考虑特征与冗余之间的交互作用来计算量化特征相关性的单个分数。 RaR的顶级功能具有最大的关联性和非冗余性。对合成数据集和现实世界数据集的评估表明，我们的方法优于几种最新的特征选择技术。有关本章的代码和数据，请访问：https：//doi.org/ 10.6084 / m9.figshare.5418706。

著录项

来源
《European conference on machine learning and principles and practice of knowledge discovery in databases》|2017年|239-255|共17页
会议地点
作者
Arvind Kumar Shekar; Tom Bocklisch; Patricia Iglesias Sanchez; Christoph Nikolas Straehle; Emmanuel Mueller;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Effect of Pruning on Feature Ranking Metrics in Highly Skewed Datasets in Text Classi?cation [J] . Muhammad Nabeel Asim, Abdur Rehman, Muhammad Idrees International journal of computer science and network security . 2017,第10期

机译：修剪对文本分类中高度偏斜数据集中的特征排序指标的影响
2. Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking [J] . Pablo Bermejo, Luis de la Ossa, Jose A. Gamez, Knowledge-Based Systems . 2012,第1期

机译：通过过滤器重新排序在高维数据集中快速包装特征子集选择
3. Combining Rough Set-based Relevance and Redundancy for the Ranking and Selection of Nominal Features [J] . Wojciech Froelich, Petr Hajek Procedia Computer Science . 2020,第5期

机译：基于粗糙的集合相关性和冗余，用于排名和选择标称功能
4. Including Multi-feature Interactions and Redundancy for Feature Ranking in Mixed Datasets [C] . Arvind Kumar Shekar, Tom Bocklisch, Patricia Iglesias Sanchez, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases . 2017

机译：包括多个功能交互和冗余，用于混合数据集中的特征排序
5. Multi-Feature Analysis of Eeg Signal on Seizure Patterns and Deep Neural Structures for Prediction of Epileptic Seizures [D] . Ma, Xinyuan. 2020

机译：癫痫发作模式和深神经结构对癫痫癫痫发作预测的多种特征分析
6. FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking [O] . Dong Xu, Lukasz Jaroszewski, Zhanwen Li, -1

机译：FFAS-3D：通过包括优化的结构特征和模板重新排序来改善折叠识别
7. Animal Sounds Classification Scheme Based on Multi-Feature Network with Mixed Datasets [O] . 2020

机译：基于混合数据集的多特征网络的动物声音分类方案

Including Multi-feature Interactions and Redundancy for Feature Ranking in Mixed Datasets

摘要

著录项

相似文献

相关主题

期刊订阅