A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

Bjoern H Menze; B Michael Kelm; Ralf Masuch; Uwe Himmelreich; Peter Bachert; Wolfgang Petrich; Fred A Hamprecht

首页> 外文期刊>BMC Bioinformatics >A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

【24h】

A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

机译：使用标准化学计量学方法对光谱数据进行特征选择和分类的随机森林及其基尼重要性的比较

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. Results We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. Conclusion The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

机译：背景技术正则化回归方法（例如主成分回归法或偏最小二乘回归法）在高维光谱数据的学习任务中表现良好，但无法明确消除不相关的特征。另一方面，具有相关基尼特征重要性的随机森林分类器允许显式消除特征，但由于其组成分类树的拓扑结构（基于特征空间中的正交划分）可能无法最佳地适应光谱数据。结果我们建议结合两种方法的最佳方法，并结合基于随机森林的基尼重要性的递归特征消除方法以及基于医学诊断，化学分类学，生物医学的光谱数据集的常规分类方法，评估基于特征选择的联合使用分析，食品科学和综合修改后的光谱数据。在这里，使用基尼特征重要性进行特征选择，并通过判别式偏最小二乘回归进行正则化分类，效果优于或优于根据不同单变量统计检验进行的滤波，或者在回归特征消除中使用回归系数。它优于随机森林分类器的直接应用或正则化分类器在整个功能集上的直接应用。结论随机森林的基尼重要性为测量光谱数据上的特征相关性提供了更好的方法，但是-在最佳的特征子集上-尽管分类器仅限于对线性相关性进行建模，但正规分类器可能优于随机森林分类器。但是，基于基尼重要性的特征选择可以在规则化线性分类之前进行，以识别特征的最佳子集，并获得降维和从分类任务中消除噪声的双重好处。

著录项

来源
《BMC Bioinformatics》 |2009年第1期|共页
作者
Bjoern H Menze; B Michael Kelm; Ralf Masuch; Uwe Himmelreich; Peter Bachert; Wolfgang Petrich; Fred A Hamprecht;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. Spectral–Spatial Classification for Hyperspectral Data Using Rotation Forests With Local Feature Extraction and Markov Random Fields [J] . Xia J., Chanussot J., Du P., Geoscience and Remote Sensing, IEEE Transactions on . 2015,第5期

机译：使用具有局部特征提取和马尔可夫随机场的旋转森林对高光谱数据进行光谱空间分类
2. Comparison of spectral selection methods in the development of classification models from visible near infrared hyperspectral imaging data [J] . Aoife A. Gowen, Jun-Li Xu, Ana Herrero-Langreo Journal of Spectral Imaging . 2019,第3期

机译：可见近红外高光谱成像数据分类模型开发中光谱选择方法的比较
3. Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data [J] . Poona Nitesh Keshavelal, van Niekerk Adriaan, Nadel Ryan Leslie, Applied Spectroscopy: Society for Applied Spectroscopy . 2016,第2期

机译：随机森林（RF）包装器用于波段选择和高光谱数据分类
4. A Comparison of Multi-Label Feature Selection Methods Using the Random Forest Paradigm [C] . Ouadie Gharroudi, Haytham Elghazel, Alex Aussem Canadian conference on artificial intelligence . 2014

机译：基于随机森林范式的多标签特征选择方法的比较
5. Feature Extraction and Random Forests Classification Software for Gas Chromatography/Differential Mobility Spectrometry (GC/DMS) Data [D] . Yeap, Danny. 2020

机译：用于气相色谱/差分移动光谱（GC / DMS）数据的特征提取和随机森林分类软件
6. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data [O] . Bjoern H Menze, B Michael Kelm, Ralf Masuch, 2009

机译：使用标准化学计量学方法对随机森林及其基尼重要性进行比较以进行光谱数据的特征选择和分类
7. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data [O] . 2009

机译：使用标准化学计量学方法对随机森林及其基尼重要性进行比较，以进行光谱数据的特征选择和分类

A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

摘要

著录项

相似文献

相关主题

期刊订阅