Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

Baoxun Xu; Joshua Zhexue Huang; Graham Williams

首页> 外文期刊>International Journal of Data Warehousing and Mining >Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

【24h】

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

机译：使用从小子空间构建的随机森林对超高维数据进行分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn 't include informative features in the selected subspaces. Consequently, classification performance of the randomforest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of [ log, (M) + 1 ] features where M is the total number of features in the dataset, our random forest model significantly outperforms existing randomforest models.

机译：选择用于生长决策树的特征子空间是构建随机森林模型的关键步骤。但是，在子空间中随机采样一些特征的通用方法不适用于包含数千个特征的高维数据，因为此类数据通常包含许多对分类无用的特征，并且随机采样通常不包含信息量所选子空间中的要素。因此，随机森林模型的分类性能受到显着影响。在本文中，作者提出了一种改进的随机森林方法，该方法使用一种新颖的特征加权方法进行子空间选择，从而提高了高维数据的分类性能。在9个现实高维数据集上进行的一系列实验表明，使用子空间大小[log，（M）+ 1]个特征（其中M是数据集中特征的总数），我们的随机森林模型明显优于现有的随机森林模型。

著录项

来源
《International Journal of Data Warehousing and Mining》 |2012年第2期|共20页
作者
Baoxun Xu; Joshua Zhexue Huang; Graham Williams;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类矿业工程;
关键词
Classification; Decision Tree; High-Dimensional Data; Random Forests; Random Subspace;

机译：分类;决策树;高维数据;随机森林;随机子空间;
入库时间 2022-08-18 10:40:40

相似文献

外文文献
中文文献
专利

1. Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces [J] . Baoxun Xu, Joshua Zhexue Huang, Graham Williams International Journal of Data Warehousing and Mining . 2012,第2期

机译：使用从小子空间构建的随机森林对超高维数据进行分类
2. Combining Bootstrapping Samples, Random Subspaces and Random Forests to Build Classifiers [J] . Daho Mostafa El Habib, Chikh Mohammed Amine Journal of Medical Imaging and Health Informatics . 2015,第3期

机译：结合自举样本，随机子空间和随机森林来构建分类器
3. Fuzzy Forests: Extending Random Forest Feature Selection for Correlated, High-Dimensional Data [J] . Daniel Conn, Tuck Ngun, Gang Li, Journal of Statistical Software . 2019,第1期

机译：模糊森林：为相关的高维数据扩展随机森林特征选择
4. Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data [C] . Li Hong Bo, Wang Wei, Ding Hong Wei, 7th IEEE International Conference on e-Business Engineering . 2010

机译：树木加权随机森林法分类高维噪声数据
5. Secure Training of Random Forest Classifiers over Continuous Data [D] . Shen, Jianwei. 2020

机译：通过连续数据安全培训随机林分类器
6. On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data [O] . Daniel F. Schwarz, Inke R. König, Andreas Ziegler -1

机译：关于野生丛林的野生动物园：快速实现高维数据的随机森林
7. Classifying many-class high-dimensional fingerprint datasets using random forest of oblique decision trees [O] . 2015

机译：使用倾斜决策树的随机森林对多类高维指纹数据集进行分类

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

摘要

著录项

相似文献

相关主题

期刊订阅