首页> 外文期刊>Data mining and knowledge discovery >PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
【24h】

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

机译:公开:集成了构建和修剪功能的决策树分类器

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. These classifiers first build a decision tree and then prune subtrees from the decision tree in a subsequent pruning phase to improve accuracy and prevent "overfitting". Generating the decision tree in two distinct phases could result in a substantial amount of wasted effort since an entire subtree constructed in the first phase may later be pruned in the next phase. In this paper, we propose PUBLIC, an improved decision tree classifier that integrates the second "pruning" phase with the invital "building" phase. In PUBLIC, a node is not expanded during the building phase, if it is determined that it will be pruned during the subsequent pruning phase. in order to make this determination for a node, before it is expanded, PUBLIC computes a lower bound on the minimum cost subtree rooted at the node. This estimate is then used by PUBLIC to identify the nodes that are certain to be pruned, and for such nodes, not expend effort on splitting them. Experimental results with real-life as well as synthetic data sets demonstrate the effectiveness of PUBLIC's integrated approach which has the ability to deliver substantial performance improvements.
机译:分类是数据挖掘中的重要问题。给定一个记录数据库,每个记录都有一个类标签,分类器会为每个类生成一个简洁而有意义的描述,该描述可用于对后续记录进行分类。许多流行的分类器构造决策树以生成类模型。这些分类器首先构建决策树,然后在随后的修剪阶段从决策树中修剪子树,以提高准确性并防止“过度拟合”。在两个不同的阶段中生成决策树可能会导致大量的工作量浪费,因为在第一阶段中构造的整个子树可能随后会在下一阶段中被修剪。在本文中,我们提出了PUBLIC,这是一种改进的决策树分类器,将第二个“修剪”阶段与非必要的“构建”阶段集成在一起。在PUBLIC中,如果确定在随后的修剪阶段将其修剪,则不会在构建阶段扩展该节点。为了确定节点,在扩展之前,PUBLIC在以该节点为根的最小成本子树上计算下限。然后,PUBLIC使用此估计来确定肯定要修剪的节点,并且对于此类节点,无需花费精力进行拆分。现实生活中的实验结果以及综合数据集证明了PUBLIC集成方法的有效性,该方法能够带来实质性的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号