首页> 外文会议>Multiple classifier systems >Random Ordinality Ensembles: A Novel Ensemble Method for Multi-valued Categorical Data
【24h】

Random Ordinality Ensembles: A Novel Ensemble Method for Multi-valued Categorical Data

机译:随机序数合奏:一种用于多值分类数据的新颖合奏方法

获取原文
获取原文并翻译 | 示例

摘要

Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble methods. We perform a random projection of the categorical data into a continuous space by imposing random ordinality on categorical attribute values. A decision tree that learns on this new continuous space is able to use binary splits, hence avoiding the data fragmentation problem. A majority-vote ensemble is then constructed with several trees, each learnt from a different continuous space. An empirical evaluation on 13 datasets shows this simple method to significantly outperform standard techniques such as Boosting and Random Forests. Theoretical study using an information gain framework is carried out to explain RO performance. Study shows that ROE is quite robust to data fragmentation problem and Random Ordinality (RO) trees are significantly smaller than trees generated using multi-way split.
机译:具有多值分类属性的数据可能导致决策树出现重大问题。高分支因子可能导致数据碎片化,而决策很少或没有统计支持。在本文中,我们提出了一种新的合奏方法,即随机序数合奏(ROE),它可以解决此问题,并且与其他流行的合奏方法相比,其准确性大大提高。通过对类别属性值施加随机序数,我们可以将类别数据随机投影到连续空间中。在这个新的连续空间上学习的决策树能够使用二进制拆分,从而避免了数据碎片问题。然后,由几棵树构成多数票合奏,每棵树都是从不同的连续空间中学到的。对13个数据集的经验评估表明,该简单方法明显优于标准技术,如Boosting和Random Forests。使用信息获取框架进行了理论研究,以解释反渗透性能。研究表明,ROE对数据碎片问题非常健壮,并且随机序数(RO)树比使用多路拆分生成的树小得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号