Data abstractions for decision tree induction

Yoshimitsu Kudoh; Makoto Haraguchi; Yoshiaki Okubo

首页> 外文期刊>Theoretical computer science >Data abstractions for decision tree induction

【24h】

Data abstractions for decision tree induction

机译：决策树归纳的数据抽象

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

When descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered knowledge tends to become complicated. A notion of data abstraction seems useful to resolve this kind of problems, as we obtain a smaller and more general database after the abstraction, from which we can quickly extract more abstract knowledge that is expected to be easier to understand. In general, however, since there exist several possible abstractions, we have to carefully select one according to which the original database is generalized. An inadequate selection would make the accuracy of extracted knowledge worse. From this point of view, we propose in this paper a method of selecting an appropriate abstraction from possible ones, assuming that our task is to construct a decision tree from a relational database. Suppose that, for each attribute in a relational database, we have a class of possible abstractions for the attribute values. As an appropriate abstraction for each attribute, we prefer an abstraction such that, even after the abstraction, the distribution of target classes necessary to perform our classification task can be preserved within an acceptable error range given by user. By the selected abstractions, the original database can be transformed into a small generalized database written in abstract values. Therefore, it would be expected that, from the generalized database, we can construct a decision tree whose size is much smaller than one constructed from the original database. Furthermore, such a size reduction can be justified under some theoretical assumptions. The appropriateness of abstraction is precisely defined in terms of the standard information theory. Therefore, we call our abstraction framework Information Theoretical Abstraction. We show some experimental results obtained by a system ITA that is an implementation of our abstraction method. From those results, it is verified that our method is very effective in reducing the size of detected decision tree without making classification errors so worse.

机译：当数据库中数据值的描述过于具体或过于详细时，从数据库中发现有用知识所需的计算复杂度通常会增加。此外，发现的知识趋于变得复杂。数据抽象的概念对于解决此类问题似乎很有用，因为在抽象之后我们获得了一个更小，更通用的数据库，我们可以从中快速提取更多希望更容易理解的抽象知识。但是，总的来说，由于存在几种可能的抽象，因此我们必须仔细选择一种依据，以对原始数据库进行概括。选择不当会使提取的知识的准确性变差。从这个角度出发，我们提出一种从可能的抽象中选择适当的抽象的方法，假设我们的任务是从关系数据库中构建决策树。假设对于关系数据库中的每个属性，我们都有一类可能的属性值抽象。作为每个属性的适当抽象，我们更喜欢这样一种抽象：即使在抽象之后，也可以将执行分类任务所需的目标类的分布保留在用户指定的可接受错误范围内。通过选择抽象，可以将原始数据库转换为以抽象值编写的小型通用数据库。因此，可以预期的是，从通用数据库中，我们可以构建一个决策树，该决策树的大小比从原始数据库构建的决策树小得多。此外，在某些理论假设下可以证明这种尺寸减小是合理的。根据标准信息理论精确定义了抽象的适当性。因此，我们将抽象框架称为信息理论抽象。我们展示了通过系统ITA获得的一些实验结果，该系统是我们的抽象方法的一种实现。从这些结果可以证明，我们的方法在减小检测到的决策树的大小方面非常有效，而不会使分类错误变得更加严重。

著录项

来源
《Theoretical computer science》 |2003年第2期|共30页
作者
Yoshimitsu Kudoh; Makoto Haraguchi; Yoshiaki Okubo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类理论、方法;
关键词
Data mining; Machine learning; Abstraction; Classification;

机译：数据挖掘;机器学习;抽象;分类;
入库时间 2022-08-18 18:54:00

相似文献

外文文献
中文文献
专利

1. Data abstractions for decision tree induction [J] . Yoshimitsu Kudoh, Makoto Haraguchi, Yoshiaki Okubo Theoretical computer science . 2003,第2期

机译：决策树归纳的数据抽象
2. Date Classification through integration of Sequential process involving Data cleaning, attribute oriented induction, Relevance analysis as preprocessor to induction of decision tree USING RELATIONAL DATABASE [J] . AMIT THAKKAR, Y P KOSTA International Journal of Engineering Science and Technology . 2011,第2期

机译：通过集成涉及数据清理，面向属性的归纳，将关联分析作为决策树的归纳的预处理程序的顺序过程的集成来进行日期分类
3. Decision Tree-Based Data Mining and Rule Induction for Identifying High Quality Groundwater Zones to Water Supply Management: a Novel Hybrid Use of Data Mining and GIS [J] . Jeihouni Mehrdad, Toomanian Ara, Mansourian Ali Water Resources Management . 2020,第1期

机译：基于决策树的数据挖掘和规则归纳，用于识别供水管理中的高质量地下水区域：数据挖掘和GIS的新型混合使用
4. Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction [C] . Jun Zhang, Adrian Silvescu, Vasant Honavar International Symposium on Abstraction, Reformulation, and Approximation . 2002

机译：多种抽象级别决策树的本体论驱动的诱导
5. Knowledge discovery in databases with joint decision outcomes: A decision-tree induction approach. [D] . Chang, Namsik. 1995

机译：具有联合决策结果的数据库中的知识发现：决策树归纳方法。
6. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data [O] . Rodrigo C Barros, Ana T Winck, Karina S Machado, 2012

机译：针对柔性接收器对接数据量身定制的决策树归纳算法的自动设计
7. Data abstractions for decision tree induction [O] . Kudoh Yoshimitsu, Haraguchi Makoto, Okubo Yoshiaki 2003

机译：决策树归纳的数据抽象

Data abstractions for decision tree induction

摘要

著录项

相似文献

相关主题

期刊订阅