Classification with correlated features: unreliability of feature ranking and solutions

Thomas Lengauer

首页> 外文期刊>Bioinformatics >Classification with correlated features: unreliability of feature ranking and solutions

【24h】

Classification with correlated features: unreliability of feature ranking and solutions

机译：具有相关特征的分类：特征排名和解决方案不可靠

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivation: Classification and feature selection of genomics or transcriptomics data is often hampered by the large number of features as compared with the small number of samples available. Moreover, features represented by probes that either have similar molecular functions (gene expression analysis) or genomic locations (DNA copy number analysis) are highly correlated. Classical model selection methods such as penalized logistic regression or random forest become unstable in the presence of high feature correlations. Sophisticated penalties such as group Lasso or fused Lasso can force the models to assign similar weights to correlated features and thus improve model stability and interpretability. In this article, we show that the measures of feature relevance corresponding to the above-mentioned methods are biased such that the weights of the features belonging to groups of correlated features decrease as the sizes of the groups increase, which leads to incorrect model interpretation and misleading feature ranking.

机译：动机：基因组学或转录组学数据的分类和特征选择通常由于数量众多的特征而不是数量较少的样本而受到阻碍。而且，由具有相似分子功能（基因表达分析）或基因组位置（DNA拷贝数分析）的探针代表的特征高度相关。在存在高特征相关性的情况下，诸如惩罚逻辑回归或随机森林之类的经典模型选择方法变得不稳定。复杂的惩罚（例如组套索或融合套索）可以迫使模型将相似的权重分配给相关特征，从而提高模型的稳定性和可解释性。在本文中，我们表明与上述方法相对应的特征相关性度量存在偏差，使得随着相关特征组的大小增加，属于相关特征组的特征权重将减小，从而导致错误的模型解释和具有误导性的功能排名。

著录项

来源
《Bioinformatics》 |2011年第14期|p.1986-1994|共9页
作者
Thomas Lengauer;
展开▼
作者单位

展开▼
收录信息美国《科学引文索引》(SCI);美国《化学文摘》(CA);
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Classification with correlated features: unreliability of feature ranking and solutions. [J] . Tolosi L, Lengauer T Bioinformatics . 2011,第14期

机译：具有相关特征的分类：特征排名和解决方案不可靠。
2. Ensemble feature selection approach based on feature ranking for rice seed images classification [J] . Tuan Dzi Lam Tran, Surinwarangkoon Thongchai, Meethongjan Kittikhun, Advances in Electrical and Electronic Engineering . 2020,第3期

机译：基于稻米图像分类的特征排名的合奏特征选择方法
3. Invariant optimal feature selection: A distance discriminant and feature ranking based solution [J] . Liang JN, Yang S, Winstanley A Pattern Recognition: The Journal of the Pattern Recognition Society . 2008,第5期

机译：不变的最优特征选择：基于距离判别和特征分级的解决方案
4. A Novel Feature Ranking Criterion for Supervised Interval Valued Feature Selection for Classification [C] . N. Vinay Kumar, D.S. Guru IAPR International Conference on Document Analysis and Recognition . 2017

机译：监督区间值特征分类的新特征排序准则
5. Classification of immunoprofiles with combined correlated features algorithm. [D] . Guo, Ting. 2015

机译：结合相关特征算法对免疫谱进行分类。
6. Histogram-Based Features Selection and Volume of Interest Ranking for Brain PET Image Classification [O] . Imène Garali, Mouloud Adel, Salah Bourennane, 2018

机译：基于直方图的特征选择和脑PET图像分类的兴趣量排名
7. Classification with correlated features: unreliability of feature ranking and solutions [O] . Laura Toloşi, Thomas Lengauer 2011

机译：相关特征分类：特征排名和解决方案的不可靠性

Classification with correlated features: unreliability of feature ranking and solutions

摘要

著录项

相似文献

相关主题

期刊订阅