首页> 外文会议>Twenty-Second international conference on very large data bases(VLDB'96) >Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules
【24h】

Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules

机译:使用优化的数字关联规则构造有效的决策树

获取原文
获取原文并翻译 | 示例

摘要

We propose an extension of an entropy-based heuristic of Quinlan [Q93] for constructing a decision tree from a large database with many numeric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pair of attributes. For R is not an element of R, the data can be split into two classes: data inside R and data outside R. We compute the region R_(opt) is not an element of R that minimizes the entropy of the splitting, and add the splitting associated with R_(opt) (for each pair of strongly correlated attributes) to the set of candidate tests in Quinlan's entropy-based heuristic.
机译:我们建议对基于Quinn [Q93]的启发式算法进行扩展,以从具有许多数值属性的大型数据库中构建决策树。 Quinlan指出,如果任何数字属性都高度相关,那么他的原始方法(以及其他现有方法)可能效率不高。我们的方法为该问题提供了一种解决方案。对于具有强相关性的每对数字属性,我们针对这些属性和决策树的客观属性计算一个二维关联规则。特别地,我们考虑与该对属性关联的平面中网格区域的族R。因为R不是R的元素,所以数据可以分为两类:R内的数据和R之外的数据。我们计算区域R_(opt)不是R的元素,它使分割的熵最小,然后加将与R_(opt)相关联的拆分(针对每对高度相关的属性)划分为基于Quinlan的基于启发式的启发式测试中的候选测试集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号