首页> 外文会议>International conference on very large data bases >Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules
【24h】

Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules

机译:通过使用优化的数字关联规则构建有效的决策树

获取原文

摘要

We propose an extension of an entropy-based heuristic of Quinlan [Q93] for constructing a decision tree from a large database with many numeric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pair of attributes. For R is not an element of R, the data can be split into two classes: data inside R and data outside R. We compute the region R_(opt) is not an element of R that minimizes the entropy of the splitting, and add the splitting associated with R_(opt) (for each pair of strongly correlated attributes) to the set of candidate tests in Quinlan's entropy-based heuristic.
机译:我们提出了扩展Quinlan的基于熵的启发式,用于从具有许多数字属性的大型数据库构建决策树。昆兰指出,如果任何数字属性强烈相关,他的原始方法(以及其他现有方法)可能效率低下。我们的方法为此问题提供了一个解决方案。对于具有强相关性的每对数字属性,我们对这些属性和决策树的目标属性计算二维关联规则。特别是,我们考虑与一对属性相关联的平面中的网格区域r。对于R不是R的元素,数据可以分为两个类:R和数据之外的数据内部。我们计算区域R_(opt)不是最小化分裂熵的元素,并添加与r_(选择)(对每对强相关的属性)相关联的分裂到奎纳兰的熵的启发式中的候选测试集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号