首页> 外文会议>European conference on machine learning and principles and practice of knowledge discovery in databases >Including Multi-feature Interactions and Redundancy for Feature Ranking in Mixed Datasets
【24h】

Including Multi-feature Interactions and Redundancy for Feature Ranking in Mixed Datasets

机译:在混合数据集中包括多功能交互和冗余功能排名

获取原文

摘要

Feature ranking is beneficial to gain knowledge and to identify the relevant features from a high-dimensional dataset. However, in several datasets, few features by itself might have small correlation with the target classes, but by combining these features with some other features, they can be strongly correlated with the target. This means that multiple features exhibit interactions among themselves. It is necessary to rank the features based on these interactions for better analysis and classifier performance. However, evaluating these interactions on large datasets is computationally challenging. Furthermore, datasets often have features with redundant information. Using such redundant features hinders both efficiency and generalization capability of the classifier. The major challenge is to efficiently rank the features based on relevance and redundance on mixed datasets. In this work, we propose a filter-based framework based on Relevance and Redundancy (RaR), RaR computes a single score that quantifies the feature relevance by considering interactions between features and redundancy. The top ranked features of RaR are characterized by maximum relevance and non-redundance. The evaluation on synthetic and real world datasets demonstrates that our approach outperforms several state-of-the-art feature selection techniques. Code and data related to this chapter are available at: https://doi.org/ 10.6084/m9.figshare.5418706.
机译:特征排名有利于获取知识并从高维数据集中识别相关特征。但是,在几个数据集中,很少有特征本身可能与目标类别有较小的相关性,但是通过将这些特征与其他一些特征组合,可以将它们与目标高度相关。这意味着多个功能在它们之间表现出相互作用。为了更好的分析和分类器性能,有必要基于这些交互对特征进行排名。但是,在大型数据集上评估这些交互在计算上具有挑战性。此外,数据集通常具有带有冗余信息的特征。使用这样的冗余特征会阻碍分类器的效率和泛化能力。主要挑战是基于混合数据集的相关性和冗余度对特征进行有效排序。在这项工作中,我们提出了一个基于相关性和冗余度(RaR)的基于过滤器的框架,RaR通过考虑特征与冗余之间的交互作用来计算量化特征相关性的单个分数。 RaR的顶级功能具有最大的关联性和非冗余性。对合成数据集和现实世界数据集的评估表明,我们的方法优于几种最新的特征选择技术。有关本章的代码和数据,请访问:https://doi.org/ 10.6084 / m9.figshare.5418706。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号