An information theoretic approach to quantify the stability of feature selection and ranking algorithms

Alaiz-Rodriguez Rocio; Parnell Andrew C.

首页> 外文期刊>Knowledge-Based Systems >An information theoretic approach to quantify the stability of feature selection and ranking algorithms

【24h】

An information theoretic approach to quantify the stability of feature selection and ranking algorithms

机译：一种量化特征选择和排序算法稳定性的信息理论方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data by selecting the most relevant features out of the noisy, redundant and irrelevant features. A problem that arises in many of these practical applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations. We propose an information-theoretic approach based on the Jensen-Shannon divergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, feature subsets as well as the lesser studied partial ranked lists. This generalized metric quantifies the difference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties including correction for change, upper/lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearman's rank correlation and the Kuncheva's index on feature ranking and selection outcomes, respectively. Additionally, experimental validation of the proposed approach is carried out on a real-world problem of food quality assessment showing its potential to quantify stability from different perspectives. (C) 2020 Elsevier B.V. All rights reserved.

机译：特征选择是处理高维数据时的关键步骤。特别是，这些技术通过选择噪声，冗余和无关功能的最相关的功能来简化知识发现过程。其中许多实际应用中出现的问题是特征选择算法的结果不稳定。因此，数据的小变化可以产生非常不同的特征排名。评估这些方法的稳定性成为前面提到的情况中的重要问题。我们提出了一种基于Jensen-Shannon分歧的信息 - 理论方法，以量化这种稳健性。与其他稳定性措施不同，该度量适用于不同的算法结果：完整排名列表，特征子集以及较小的研究部分排名列表。这种通用度量在概率的方法之后通过相同大小的整个列表中的差异量化，并且能够更加重视出现在列表顶部的分歧。此外，它具有所需的性质，包括用于改变的校正，上/下界和确定性选择的条件。我们说明了这种稳定度量的使用以完全受控的方式生成的数据，并将其与流行度量进行比较，包括Spearman的秩相关和Kuncheva在特征排序和选择结果上的索引。此外，拟议方法的实验验证是对食品质量评估的真实问题，表明其潜力从不同的角度来量化稳定性。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2020年第may11期|105745.1-105745.13|共13页
作者
Alaiz-Rodriguez Rocio; Parnell Andrew C.;
展开▼
作者单位

Univ Leon Dept Elect Syst & Automat Engn Campus Vegazana S-N E-24071 Leon Spain;

Maynooth Univ Hamilton Inst Maynooth Kildare Ireland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature selection; Feature ranking; Stability; Robustness; Jensen-Shannon divergence;

机译：特征选择;特征排名;稳定性;鲁棒性;Jensen-Shannon发散;

相似文献

外文文献
中文文献
专利

1. On Feature Selection Algorithms and Feature Selection Stability Measures : A Comparative Analysis [J] . Mohana Chelvan P, Perumal K International Journal of Computer Science & Information Technology (IJCSIT) . 2017,第3期

机译：特征选择算法与特征选择稳定性测度的比较分析
2. Feature selection for k-means clustering stability: theoretical analysis and an algorithm [J] . Dimitrios Mavroeidis, Elena Marchiori Data mining and knowledge discovery . 2014,第4期

机译：k均值聚类稳定性的特征选择：理论分析和算法
3. Fast multi-label feature selection based on information-theoretic feature ranking [J] . Lee Jaesung, Kim Dae-Won Pattern Recognition: The Journal of the Pattern Recognition Society . 2015,第9期

机译：基于信息论特征排序的快速多标签特征选择
4. A subset-search and ranking based feature-selection for histology image classification using global and local quantification [C] . J. Coatelen, A. Albouy-Kissi, B. Albouy-Kissi, International Conference on Image Processing Theory, Tools and Applications . 2015

机译：使用全局和本地量化的组织学图像分类的基于子集搜索和排序的特征选择
5. Quantifying the Stability of Feature Selection [D] . Nogueira, Sarah. 2018

机译：量化特征选择的稳定性
6. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection [O] . Shobeir Fakhraei, Hamid Soltanian-Zadeh, Farshad Fotouhi -1

机译：特征分级和选择的单变量分类器的偏差和稳定性
7. Qualitative usability feature selection with ranking: a novel approach for ranking the identified usability problematic attributes for academic websites using data-mining techniques [O] . Kalpna Sagar, Anju Saha 2017

机译：具有排名的定性可用性特征选择：使用数据挖掘技术对学术网站进行识别的可用性问题属性的新方法
8. Pattern Search Ranking and Selection Algorithms for Mixed-Variable Optimization of Stochastic Systems [R] . Iver, T. A. 2004

机译：随机系统混合变量优化模式搜索排序与选择算法

An information theoretic approach to quantify the stability of feature selection and ranking algorithms

摘要

著录项

相似文献

相关主题

期刊订阅