首页> 外文学位 >Small sample learning of multivariate distributions with compositional graphical models.
【24h】

Small sample learning of multivariate distributions with compositional graphical models.

机译:使用组成图形模型进行多元分布的小样本学习。

获取原文
获取原文并翻译 | 示例

摘要

We address the problem of finding an optimal approximation to a multivariate probability distribution starting from a finite set of samples. We propose a compositional approach that uses probabilistic graphical models and combines structure and parameter learning. Model complexity is adapted to sample size. In particular, we anticipate efficient learning within small sample regimes through the introduction of biases that restrict our model class by constraining the set of admissible distributions.;We begin by selecting a set of low-dimensional distributions, which we call "primitives". These primitives typically contain very small numbers of variables (e.g. pairs and triplets) and can be reliably estimated from data even when samples are small. We define a set of merging rules to iteratively assemble these primitives into increasingly large compositions, which we use to specify a distribution over the whole set of variables. We define our model class as the set of all admissible distributions that can be reached through a sequence of merges, and we show that every distribution of this type is uniquely identified with a directed acyclic graph of primitives.;Each primitive has an associated "score", which is defined as the likelihood ratio corresponding to the gain incurred in fusing the individual variables into the primitive distribution relative to independent variables. The global score for any given composition decomposes into a sum of local scores, one for each participating primitive. Since all scores can be precomputed, parameter estimation (building primitives) is separated from model construction (competitive assembly). Furthermore, the structure search problem (i.e. optimizing over all decompositions for valid compositions that maximize the score) can be solved using integer linear programming.;We use the name Competitive Assembly of Marginals (CAM) to refer to models that are learned within this general framework. We present several subfamilies of CAM models that incorporate diverse structural and parametric constraints. We validate the advantages of our method for small samples using both synthetic and real data. In terms of practical applications, we illustrate how our models can be used to infer semantic networks from text and to reconstruct networks of molecular interactions in computational biology.
机译:我们解决了从有限的样本集开始寻找多元概率分布的最佳近似问题。我们提出了一种组合方法,该方法使用概率图形模型并将结构和参数学习结合在一起。模型复杂度适合样本量。尤其是,我们期望通过引入偏见来限制样本组,从而通过限制可允许分布的集合来限制模型类别,从而在小样本样本体制下实现高效学习。我们从选择一组低维分布开始,我们称之为“原始”。这些原语通常包含非常少量的变量(例如,对和三胞胎),即使样本很小,也可以从数据中可靠地进行估计。我们定义了一组合并规则,以将这些原语迭代地组合为越来越大的组合,我们用它们来指定整个变量集的分布。我们将模型类定义为可以通过一系列合并而达到的所有允许分布的集合,并且我们表明,此类型的每个分布都由有向图元的有向无环图唯一标识。每个图元都有一个关联的“得分” ”定义为似然比,对应于将各个变量融合到相对于自变量的原始分布中所产生的增益。任何给定合成的整体分数将分解为局部分数的总和,每个参与图元的分数都为一个。由于所有分数都可以预先计算,因此参数估计(构建基元)与模型构建(竞争性装配)是分开的。此外,可以使用整数线性规划来解决结构搜索问题(即,对所有分解进行优化以使分数最大化的有效成分)。;我们使用“边际竞争性装配”(CAM)这一名称来指代在本概要中学习的模型框架。我们介绍了CAM模型的几个子族,这些子模型包含了不同的结构和参数约束。我们使用合成数据和真实数据验证了我们的方法对小样品的优势。在实际应用方面,我们说明了如何将我们的模型用于从文本推断语义网络以及在计算生物学中重建分子相互作用的网络。

著录项

  • 作者

    Sanchez-Vega, Francisco.;

  • 作者单位

    The Johns Hopkins University.;

  • 授予单位 The Johns Hopkins University.;
  • 学科 Applied Mathematics.;Mathematics.;Statistics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 223 p.
  • 总页数 223
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:42:26

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号