首页> 外文学位 >Fast factored density estimation and compression with Bayesian networks.
【24h】

Fast factored density estimation and compression with Bayesian networks.

机译:贝叶斯网络的快速分解密度估计和压缩。

获取原文
获取原文并翻译 | 示例

摘要

Many important data analysis tasks can be addressed by formulating them as probability estimation problems. For example, a popular general approach to automatic classification problems is to learn a probabilistic model of each class from data in which the classes are known, and then use Bayes's rule with these models to predict the correct classes of other data for which they are not known. Anomaly detection and scientific discovery tasks can often be addressed by learning probability models over possible events and then looking for events to which these models assign low probabilities. Many data compression algorithms such as Huffman coding and arithmetic coding rely on probabilistic models of the data stream in order to achieve high compression rates.; In this thesis we examine several aspects of probability estimation algorithms. In particular, we focus on the automatic learning and use of probability models based on Bayesian networks, a convenient formalism in which the probability estimation task is split into many simpler subtasks. We also emphasize computational efficiency. First, we provide Bayesian network-based algorithms for losslessly compressing large discrete datasets. We show that these algorithms can produce compression ratios dramatically higher than those achieved by popular compression programs such as gzip or bzip2, yet still maintain megabyte-per-second decoding speeds on well-aged conventional PCs. Next, we provide algorithms for quickly learning Bayesian network-based probability models over domains with both discrete and continuous variables. We show how recently developed methods for quickly learning Gaussian mixture models from data [Moo99] can be used to learn Bayesian networks modeling complex nonlinear relationships over dozens of variables from thousands of datapoints in a practical amount of time. Finally we explore a large space of tree-based density learning algorithms, and show that they can be used to quickly learn Bayesian networks that can provide accurate density estimates and that are fast to evaluate.
机译:通过将它们表示为概率估计问题,可以解决许多重要的数据分析任务。例如,一种针对自动分类问题的流行通用方法是从已知类别的数据中学习每个类别的概率模型,然后对这些模型使用贝叶斯定律来预测其他类别数据所不具有的正确类别众所周知。异常检测和科学发现任务通常可以通过以下方法解决:在可能的事件上学习概率模型,然后寻找这些模型分配低概率的事件。许多数据压缩算法,例如霍夫曼编码和算术编码,都依靠数据流的概率模型来实现高压缩率。在本文中,我们研究了概率估计算法的几个方面。特别是,我们专注于基于贝叶斯网络的概率模型的自动学习和使用,贝叶斯网络是一种方便的形式主义,其中将概率估计任务划分为许多更简单的子任务。我们还强调计算效率。首先,我们提供基于贝叶斯网络的算法,用于无损压缩大型离散数据集。我们表明,这些算法所产生的压缩率大大高于诸如 gzip bzip2 之类的流行压缩程序所达到的压缩率,但仍能保持每秒兆字节的解码速度时代的常规PC。接下来,我们提供用于在具有离散变量和连续变量的域上快速学习基于贝叶斯网络的概率模型的算法。我们将展示如何使用最近开发的从数据[Moo99]中快速学习高斯混合模型的方法,来学习贝叶斯网络,该贝叶斯网络在实际的时间内在数以千计的数据点上对数十个变量的复杂非线性关系进行建模。最后,我们探索了大量基于树的密度学习算法,并表明它们可用于快速学习贝叶斯网络,该贝叶斯网络可提供准确的密度估计值并且可以快速进行评估。

著录项

  • 作者

    Davies, Scott Tor.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 181 p.
  • 总页数 181
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号