...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Stratified sampling for feature subspace selection in random forests for high dimensional data
【24h】

Stratified sampling for feature subspace selection in random forests for high dimensional data

机译:用于高维数据的随机森林中的特征子空间选择的分层采样

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

For high dimensional data a large portion of features are often not informative of the class of the objects. Random forest algorithms tend to use a simple random sampling of features in building their decision trees and consequently select many subspaces that contain few, if any, informative features. In this paper we propose a stratified sampling method to select the feature subspaces for random forests with high dimensional data. The key idea is to stratify features into two groups. One group will contain strong informative features and the other weak informative features. Then, for feature subspace selection, we randomly select features from each group proportionally. The advantage of stratified sampling is that we can ensure that each subspace contains enough informative features for classification in high dimensional data. Testing on both synthetic data and various real data sets in gene classification, image categorization and face recognition data sets consistently demonstrates the effectiveness of this new method. The performance is shown to better that of state-of-the-art algorithms including SVM, the four variants of random forests (RF, ERT, enrich-RF, and oblique-RF), and nearest neighbor (NN) algorithms.
机译:对于高维数据,大部分特征通常无法告知对象的类别。随机森林算法倾向于在构建决策树时使用特征的简单随机抽样,因此会选择许多子空间,这些子空间包含的信息量很少(如果有的话)。在本文中,我们提出了一种分层抽样方法来为具有高维数据的随机森林选择特征子空间。关键思想是将要素分为两组。一组将包含较强的信息功能,而另一组将包含较弱的信息功能。然后,对于特征子空间选择,我们按比例从每个组中随机选择特征。分层抽样的优势在于,我们可以确保每个子空间都包含足够的信息性特征,可用于对高维数据进行分类。在基因分类,图像分类和面部识别数据集中对合成数据和各种实际数据集进行的测试始终证明了该新方法的有效性。表现出更好的性能,包括SVM,随机森林的四种变体(RF,ERT,rich-RF和oblique-RF)和最近邻居(NN)算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号