首页> 外文学位 >Small sample learning of multivariate distributions with compositional graphical models.

【24h】

Small sample learning of multivariate distributions with compositional graphical models.

机译：使用组成图形模型进行多元分布的小样本学习。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We address the problem of finding an optimal approximation to a multivariate probability distribution starting from a finite set of samples. We propose a compositional approach that uses probabilistic graphical models and combines structure and parameter learning. Model complexity is adapted to sample size. In particular, we anticipate efficient learning within small sample regimes through the introduction of biases that restrict our model class by constraining the set of admissible distributions.;We begin by selecting a set of low-dimensional distributions, which we call "primitives". These primitives typically contain very small numbers of variables (e.g. pairs and triplets) and can be reliably estimated from data even when samples are small. We define a set of merging rules to iteratively assemble these primitives into increasingly large compositions, which we use to specify a distribution over the whole set of variables. We define our model class as the set of all admissible distributions that can be reached through a sequence of merges, and we show that every distribution of this type is uniquely identified with a directed acyclic graph of primitives.;Each primitive has an associated "score", which is defined as the likelihood ratio corresponding to the gain incurred in fusing the individual variables into the primitive distribution relative to independent variables. The global score for any given composition decomposes into a sum of local scores, one for each participating primitive. Since all scores can be precomputed, parameter estimation (building primitives) is separated from model construction (competitive assembly). Furthermore, the structure search problem (i.e. optimizing over all decompositions for valid compositions that maximize the score) can be solved using integer linear programming.;We use the name Competitive Assembly of Marginals (CAM) to refer to models that are learned within this general framework. We present several subfamilies of CAM models that incorporate diverse structural and parametric constraints. We validate the advantages of our method for small samples using both synthetic and real data. In terms of practical applications, we illustrate how our models can be used to infer semantic networks from text and to reconstruct networks of molecular interactions in computational biology.

机译：我们解决了从有限的样本集开始寻找多元概率分布的最佳近似问题。我们提出了一种组合方法，该方法使用概率图形模型并将结构和参数学习结合在一起。模型复杂度适合样本量。尤其是，我们期望通过引入偏见来限制样本组，从而通过限制可允许分布的集合来限制模型类别，从而在小样本样本体制下实现高效学习。我们从选择一组低维分布开始，我们称之为“原始”。这些原语通常包含非常少量的变量（例如，对和三胞胎），即使样本很小，也可以从数据中可靠地进行估计。我们定义了一组合并规则，以将这些原语迭代地组合为越来越大的组合，我们用它们来指定整个变量集的分布。我们将模型类定义为可以通过一系列合并而达到的所有允许分布的集合，并且我们表明，此类型的每个分布都由有向图元的有向无环图唯一标识。每个图元都有一个关联的“得分” ”定义为似然比，对应于将各个变量融合到相对于自变量的原始分布中所产生的增益。任何给定合成的整体分数将分解为局部分数的总和，每个参与图元的分数都为一个。由于所有分数都可以预先计算，因此参数估计（构建基元）与模型构建（竞争性装配）是分开的。此外，可以使用整数线性规划来解决结构搜索问题（即，对所有分解进行优化以使分数最大化的有效成分）。;我们使用“边际竞争性装配”（CAM）这一名称来指代在本概要中学习的模型框架。我们介绍了CAM模型的几个子族，这些子模型包含了不同的结构和参数约束。我们使用合成数据和真实数据验证了我们的方法对小样品的优势。在实际应用方面，我们说明了如何将我们的模型用于从文本推断语义网络以及在计算生物学中重建分子相互作用的网络。

著录项

作者
Sanchez-Vega, Francisco.;
展开▼
作者单位

The Johns Hopkins University.;

展开▼
授予单位 The Johns Hopkins University.;
学科 Applied Mathematics.;Mathematics.;Statistics.
学位 Ph.D.
年度 2012
页码 223 p.
总页数 223
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:42:26

相似文献

外文文献
中文文献
专利

1. Approximation and sampling of multivariate probability distributions in the tensor train decomposition [J] . Dolgov Sergey, Anaya-Izquierdo Karim, Fox Colin, Statistics and computing . 2020,第3期

机译：张量分解中多元概率分布的逼近与采样
2. Large-Sample Asymptotic Approximations for the Sampling and Posterior Distributions of Differential Entropy for Multivariate Normal Distributions [J] . Guillaume Marrelec, Habib Benali Entropy . 2011,第4期

机译：多元正态分布的微分熵的采样和后验分布的大样本渐近近似
3. Testing Multivariate Distributions in GARCH Models. [J] . Bai Jushan, Chen Zhihong Journal of Econometrics . 2008,第1期

机译：在GARCH模型中测试多元分布。
4. Deterministic Sampling of Multivariate Densities based on Projected Cumulative Distributions [C] . Uwe D. Hanebeck Annual Conference on Information Sciences and Systems . 2020

机译：基于投影累积分布的多元密度确定性抽样
5. A class of bivariate Erlang distributions and ruin probabilities in multivariate risk models. [D] . Groparu-Cojocaru, Ionica. 2013

机译：多元风险模型中的一类二元Erlang分布和破产概率。
6. Bayesian Inference of Genetic Parameters Based on Conditional Decompositions of Multivariate Normal Distributions [O] . Jon Hallander, Patrik Waldmann, Chunkao Wang, 2010

机译：基于多元正态分布的条件分解的遗传参数贝叶斯推断
7. Large-Sample Asymptotic Approximations for the Sampling and Posterior Distributions of Differential Entropy for Multivariate Normal Distributions [O] . Habib Benali, Guillaume Marrelec 2011

机译：多元正态分布的微分熵采样和后验分布的大样本渐近逼近
8. Approximate Small-Sample Distributions for Multivariate Two-Sample Nonparametric Tests [R] . Bradley, R. A., Patel, K. M., Wackerly, D. D. 1970

机译：多元双样本非参数检验的近似小样本分布

Small sample learning of multivariate distributions with compositional graphical models.

摘要

著录项

相似文献

相关主题

期刊订阅