首页> 外文期刊>Electronic Colloquium on Computational Complexity >Top-down induction of decision trees: rigorous guarantees and inherent limitations
【24h】

Top-down induction of decision trees: rigorous guarantees and inherent limitations

机译:自上而下的决策树归纳法:严格的保证和固有的局限性

获取原文
           

摘要

Consider the following heuristic for building a decision tree for a function f : 0 1 n 1 . Place the most influential variable x i of f at the root, and recurse on the subfunctions f x i =0 and f x i =1 on the left and right subtrees respectively; terminate once the tree is an -approximation of f . We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: Upper bound: For every f with decision tree size s and every ( 0 2 1 ) , this heuristic builds a decision tree of size at most s O ( log ( s ) log (1 )) . Lower bound: For every ( 0 2 1 ) and s 2 O ( n ) , there is an f with decision tree size s such that this heuristic builds a decision tree of size s ( log s ) . We also obtain upper and lower bounds for monotone functions: s O ( log s ) and s ( 4 log s ) respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009).Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989), and even have certain qualitative advantages. Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime.
机译:考虑以下为函数f构造决策树的启发式方法:0 1 n 1。将影响力最大的变量x i放在根上,分别递归到左子树和右子树上的子函数f x i = 0和f x i = 1;一旦树是f的近似值,终止。我们分析此启发式算法的质量,获得接近匹配的上下限:上限:对于决策树大小为s的每个f和每个(0 2 1),此启发式方法最多构建一个决策树,大小为s O(log (s)日志(1))。下界:对于每个(0 2 1)和s 2 O(n),都有一个决策树大小为s的f,使得该启发式方法建立了大小为s(log s)的决策树。我们还获得了单调函数的上限和下限:分别为s O(log s)和s(4 log s)。下界反驳了Fiat和Pechyony(2004)和Lee(2009)的猜想。我们的上限产生了在均匀分布下正确学习决策树的新算法。我们证明了这些算法-由ID3,C4.5和CART等广泛采用的,经验丰富的自上而下的成功决策树学习启发法所推动-实现了可证明的保证,与当前最快的算法相比具有可比性(Ehrenfeucht and Haussler,1989),甚至具有一定的质量优势。我们的下限为这些启发式方法的局限性提供了新的思路。最后,我们回顾一下Ehrenfeucht和Haussler的经典作品。我们对其进行扩展,以提供第一个均匀分布的适当学习算法,该算法可实现多项式样本和内存复杂性,同时匹配其最新的准多项式运行时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号