首页> 外文期刊>Theoretical computer science >Data abstractions for decision tree induction
【24h】

Data abstractions for decision tree induction

机译:决策树归纳的数据抽象

获取原文
获取原文并翻译 | 示例
       

摘要

When descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered knowledge tends to become complicated. A notion of data abstraction seems useful to resolve this kind of problems, as we obtain a smaller and more general database after the abstraction, from which we can quickly extract more abstract knowledge that is expected to be easier to understand. In general, however, since there exist several possible abstractions, we have to carefully select one according to which the original database is generalized. An inadequate selection would make the accuracy of extracted knowledge worse. From this point of view, we propose in this paper a method of selecting an appropriate abstraction from possible ones, assuming that our task is to construct a decision tree from a relational database. Suppose that, for each attribute in a relational database, we have a class of possible abstractions for the attribute values. As an appropriate abstraction for each attribute, we prefer an abstraction such that, even after the abstraction, the distribution of target classes necessary to perform our classification task can be preserved within an acceptable error range given by user. By the selected abstractions, the original database can be transformed into a small generalized database written in abstract values. Therefore, it would be expected that, from the generalized database, we can construct a decision tree whose size is much smaller than one constructed from the original database. Furthermore, such a size reduction can be justified under some theoretical assumptions. The appropriateness of abstraction is precisely defined in terms of the standard information theory. Therefore, we call our abstraction framework Information Theoretical Abstraction. We show some experimental results obtained by a system ITA that is an implementation of our abstraction method. From those results, it is verified that our method is very effective in reducing the size of detected decision tree without making classification errors so worse.
机译:当数据库中数据值的描述过于具体或过于详细时,从数据库中发现有用知识所需的计算复杂度通常会增加。此外,发现的知识趋于变得复杂。数据抽象的概念对于解决此类问题似乎很有用,因为在抽象之后我们获得了一个更小,更通用的数据库,我们可以从中快速提取更多希望更容易理解的抽象知识。但是,总的来说,由于存在几种可能的抽象,因此我们必须仔细选择一种依据,以对原始数据库进行概括。选择不当会使提取的知识的准确性变差。从这个角度出发,我们提出一种从可能的抽象中选择适当的抽象的方法,假设我们的任务是从关系数据库中构建决策树。假设对于关系数据库中的每个属性,我们都有一类可能的属性值抽象。作为每个属性的适当抽象,我们更喜欢这样一种抽象:即使在抽象之后,也可以将执行分类任务所需的目标类的分布保留在用户指定的可接受错误范围内。通过选择抽象,可以将原始数据库转换为以抽象值编写的小型通用数据库。因此,可以预期的是,从通用数据库中,我们可以构建一个决策树,该决策树的大小比从原始数据库构建的决策树小得多。此外,在某些理论假设下可以证明这种尺寸减小是合理的。根据标准信息理论精确定义了抽象的适当性。因此,我们将抽象框架称为信息理论抽象。我们展示了通过系统ITA获得的一些实验结果,该系统是我们的抽象方法的一种实现。从这些结果可以证明,我们的方法在减小检测到的决策树的大小方面非常有效,而不会使分类错误变得更加严重。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号