【24h】

A bottom-up oblique decision tree induction algorithm

机译:自下而上的斜决策树归纳算法

获取原文

摘要

Decision tree induction algorithms are widely used in knowledge discovery and data mining, specially in scenarios where model comprehensibility is desired. A variation of the traditional univariate approach is the so-called oblique decision tree, which allows multivariate tests in its non-terminal nodes. Oblique decision trees can model decision boundaries that are oblique to the attribute axes, whereas univariate trees can only perform axis-parallel splits. The majority of the oblique and univariate decision tree induction algorithms perform a top-down strategy for growing the tree, relying on an impurity-based measure for splitting nodes. In this paper, we propose a novel bottom-up algorithm for inducing oblique trees named BUTIA. It does not require an impurity-measure for dividing nodes, since we know a priori the data resulting from each split. For generating the splitting hyperplanes, our algorithm implements a support vector machine solution, and a clustering algorithm is used for generating the initial leaves. We compare BUTIA to traditional univariate and oblique decision tree algorithms, C4.5, CART, OC1 and FT, as well as to a standard SVM implementation, using real gene expression benchmark data. Experimental results show the effectiveness of the proposed approach in several cases.
机译:决策树归纳算法广泛用于知识发现和数据挖掘中,尤其是在需要模型可理解性的情况下。传统单变量方法的一种变体是所谓的斜决策树,它允许在其非终端节点中进行多变量测试。倾斜决策树可以对与属性轴倾斜的决策边界进行建模,而单变量树只能执行轴平行拆分。大多数倾斜和单变量决策树归纳算法执行自顶向下策略来生长树,这依赖于基于杂质的度量来分裂节点。在本文中,我们提出了一种新的自下而上的诱导斜树的算法,称为BUTIA。它不需要用于划分节点的杂质测量,因为我们先验地知道了每次拆分所产生的数据。为了生成分裂超平面,我们的算法实现了支持向量机解决方案,并且使用聚类算法生成初始叶子。我们使用实际基因表达基准数据,将BUTIA与传统的单变量和斜决策树算法C4.5,CART,OC1和FT以及标准SVM实现进行了比较。实验结果证明了该方法在几种情况下的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号