首页> 外文学位 >Extended Bayes and skewing: On two improvements to standard induction-based learning algorithms.
【24h】

Extended Bayes and skewing: On two improvements to standard induction-based learning algorithms.

机译:扩展的贝叶斯和偏斜:基于标准归纳学习算法的两个改进。

获取原文
获取原文并翻译 | 示例

摘要

We address improvements to Naive Bayes (NB) and Decision Trees, two standard induction-based methods for solving classification problems. The goal of these improvements is to extract more information from the training examples, in order to more accurately classify new examples.; The first part of this thesis presents a new learning algorithm, Extended Bayes (EB), which is an extension of NB. NB classifies new examples using conditional probabilities computed from the training data. It is simple, fast, and widely applicable. EB retains these positive properties of NB, while equaling or surpassing the predictive power of NB as measured on a wide variety of benchmark UC-Irvine datasets. EB is based on two ideas, which interact. The first is to find sets of seemingly dependent attributes and to add them as new attributes. The second is to exploit "zeroes", i.e., the negative evidence provided by attribute values that do not occur in particular classes in the training data. Zeroes are handled in Naive Bayes by smoothing (substituting a small positive value). In contrast, EB uses them as evidence that a potential class labeling may be wrong.; The second part of the thesis presents a theoretical analysis of skewing, a recent technique for improving the performance of standard decision tree algorithms [42]. Decision tree algorithms use the training data to build a decision tree that computes a function mapping examples to class labels. Standard decision tree algorithms perform poorly in learning certain "difficult" functions, such as parity, when irrelevant attributes are present, because of an inability to distinguish between relevant and irrelevant attributes. While experimental evidence indicates that skewing can remedy this problem, prior to the work in this thesis, there was almost no analysis of when and why skewing worked. We prove that, in an idealized setting, skewing can always identify relevant attributes. We also present an analysis of a variant of skewing called sequential skewing, and prove results concerning properties of the class of "difficult" functions.
机译:我们解决了朴素贝叶斯(NB)和决策树这两种基于归纳的标准方法用于解决分类问题的改进问题。这些改进的目的是从培训示例中提取更多信息,以便更准确地对新示例进行分类。本文的第一部分提出了一种新的学习算法,扩展贝叶斯(EB),它是NB的扩展。 NB使用从训练数据计算出的条件概率对新示例进行分类。它简单,快速且广泛适用。 EB保留了NB的这些积极特性,而在各种基准UC-Irvine数据集上测得的NB等于或超过NB的预测能力。 EB基于两个相互影响的想法。首先是查找看似相关的属性集,并将其添加为新属性。第二种是利用“零”,即由属性值提供的否定证据在训练数据的特定类别中不会出现。在朴素贝叶斯中,通过平滑处理零(代入一个小的正值)。相反,EB使用它们作为证据表明潜在的类别标签可能是错误的。论文的第二部分介绍了偏斜的理论分析,这是一种用于提高标准决策树算法性能的最新技术[42]。决策树算法使用训练数据来构建决策树,该决策树计算将功能映射到类标签的函数。当存在无关属性时,标准决策树算法在学习某些“困难”功能(如奇偶校验)时表现不佳,因为无法区分相关属性和无关属性。尽管实验证据表明,偏斜可以解决这个问题,但在本文进行工作之前,几乎没有分析过何时以及为什么偏斜起作用。我们证明,在理想的设置中,倾斜总是可以识别相关属性。我们还介绍了一种称为顺序偏斜的偏斜变体,并证明了与“难”函数类的性质有关的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号