首页> 外文期刊>Communications in Statistics >Combining clustering of variables and feature selection using random forests
【24h】

Combining clustering of variables and feature selection using random forests

机译:使用随机林组合变量和特征选择的聚类

获取原文
获取原文并翻译 | 示例

摘要

Standard approaches to tackle high-dimensional supervised classification often include variable selection and dimension reduction. The proposed methodology combines clustering of variables and feature selection. Hierarchical clustering of variables allows to built groups of correlated variables and summarizes each group by a synthetic variable. Originality is that groups of variables are unknown a priori. Moreover clustering approach deals with both numerical and categorical variables. Among all the possible partitions, the most relevant synthetic variables are selected with a procedure using random forests. Numerical performances are illustrated on simulated and real datasets. Selection of groups of variables provides easier interpretation of results.
机译:解决高维监督分类的标准方法通常包括可变选择和尺寸减小。所提出的方法组合了变量和特征选择的聚类。变量的分层群集允许构建相关变量组,并通过合成变量汇总每个组。原创性是变量组是未知的先验。此外,聚类方法涉及数值和分类变量。在所有可能的分区中,使用随机林的程序选择最相关的合成变量。在模拟和真实数据集上说明了数值性能。一组变量组织可以更容易地解释结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号