首页> 外文期刊>International Journal of Data Warehousing and Mining >Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces
【24h】

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

机译:使用从小子空间构建的随机森林对超高维数据进行分类

获取原文
获取原文并翻译 | 示例
       

摘要

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn 't include informative features in the selected subspaces. Consequently, classification performance of the randomforest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of [ log, (M) + 1 ] features where M is the total number of features in the dataset, our random forest model significantly outperforms existing randomforest models.
机译:选择用于生长决策树的特征子空间是构建随机森林模型的关键步骤。但是,在子空间中随机采样一些特征的通用方法不适用于包含数千个特征的高维数据,因为此类数据通常包含许多对分类无用的特征,并且随机采样通常不包含信息量所选子空间中的要素。因此,随机森林模型的分类性能受到显着影响。在本文中,作者提出了一种改进的随机森林方法,该方法使用一种新颖的特征加权方法进行子空间选择,从而提高了高维数据的分类性能。在9个现实高维数据集上进行的一系列实验表明,使用子空间大小[log,(M)+ 1]个特征(其中M是数据集中特征的总数),我们的随机森林模型明显优于现有的随机森林模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号