BOAT-Optimistic Decision Tree Construction

机译：船乐观决策树建设

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees. All current algorithms to construct decision trees, including all main-memory algorithms, make one scan over the training database per level of the tree. We introduce a new algorithm (BOAT) for decision tree construction that improves upon earlier algorithms in both performance and functionality. BOAT constructs several levels of the tree in only two scans over the training database, resulting in an average performance gain of 300% over previous work. The key to this performance improvement is a novel optimistic approach to tree construction in which we construct an initial tree using a small subset of the data and refine it to arrive at the final tree. We guarantee that any difference with respect to the "real" tree (i.e., the tree that would be constructed by examining all the data in a traditional way) is detected and corrected. The correction step occasionally requires us to make additional scans over subsets of the data; typically, this situation rarely arises, and can be addressed with little added cost. Beyond offering faster tree construction, BOAT is the first scalable algorithm with the ability to incrementally update the tree with respect to both insertions and deletions over the dataset. This property is valuable in dynamic environments such as data warehouses, in which the training dataset changes over time. The BOAT update operation is much cheaper than completely rebuilding the tree, and the resulting tree is guaranteed to be identical to the tree that would be produced by a complete re-build.

机译：分类是一个重要的数据挖掘问题。鉴于记录的训练数据库，每个标签具有类别标签，分类的目标是建立一个可以用来预测未来的阶级标签，没有标签记录的简洁模式。一个非常受欢迎的类分类器是决策树。目前所有的算法来构建决策树，包括所有的主内存的算法，使得在每树的级别的训练数据库进行一次扫描。我们引进了决策树构造一个新的算法（船）后在性能和功能早些时候算法改进。船构造树的几个层次中只有两个在训练扫描数据库，造成了前期工作的300％的平均性能提升。这种性能提高的关键是一种新的乐观地对待树构建中，我们使用数据的一小部分构造一个初始树和完善它在最终的树到达。我们保证，相对于“真正的”树中的任何差异（即，将通过检查以传统方式中的所有数据将建造的树）被检测和校正。校正步骤偶尔需要我们在数据的子集进行额外扫描;通常，这种情况很少发生，而且可以加入少量的成本来解决。除了提供更快的树结构，船是第一个可扩展的算法相对于在数据集都插入和删除增量更新的树。此属性是在动态环境中有价值的，如数据仓库，在这一段时间的训练数据集的变化。船更新操作比完全重建树，并将得到的树被保证是等同于将被完全重新构建待生产的树便宜得多。

著录项

来源
《ACM SIGMOD International Conference on Management of Data》|1999年||共12页
会议地点
作者
Johannes Gehrke; Venkatesh Ganti; Raghu Ramakrishnan; Wei-Yin Loh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-532;
关键词

相似文献

外文文献
中文文献
专利

1. A Greedy Algorithm for Construction of Decision Trees for Tables with Many-Valued Decisions - A Comparative Study [J] . Mohammad Azad, Igor Chikalov, Mikhail Moshkov, Fundamenta Informaticae . 2013,第1a2期

机译：带有多值决策表的决策树构造的贪心算法-对比研究
2. Derived operating rules for a reservoir operation system: Comparison of decision trees, neural decision trees and fuzzy decision trees [J] . Chih-Chiang Wei, Nien-Sheng Hsu Water resources research . 2008,第2期

机译：水库操作系统的导出操作规则：决策树，神经决策树和模糊决策树的比较
3. Effect of Occupational Health and Safety Training for Chinese Construction Workers Based on the CHAID Decision Tree [J] . Zhonghong Cao, Tao Chen, Yuqing Cao Frontiers in Public Health . 2021,第a期

机译：基于CHAID决策树的职业健康与安全培训对职业健康与安全培训的影响
4. BOAT-Optimistic Decision Tree Construction [C] . Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, ACM SIGMOD International Conference on Management of Data . 1999

机译：船乐观决策树建设
5. Semi-Greedy Construction of Oblique-Split Decision Trees [D] . Larriva, Matthew Rudolph 2019

机译：斜分割决策树的半贪心构造
6. Construction of Decision Trees Based on Gene Expression Omnibus Data to Classify Bladder Cancer and Its Subtypes [O] . Jia-Quan Zhou, Xin-Li Kang, Cong-Jie Xu, 2021

机译：基于基因表达综合征的决策树木对膀胱癌及其亚型的构建
7. CONSTRUCTION OF DECISION TREES USING DECISION RULES [O] . 2018

机译：使用决策规则构建决策树

BOAT-Optimistic Decision Tree Construction

摘要

著录项

相似文献

相关主题

期刊订阅