We address the problem of finding an optimal approximation to a multivariate probability distribution starting from a finite set of samples. We propose a compositional approach that uses probabilistic graphical models and combines structure and parameter learning. Model complexity is adapted to sample size. In particular, we anticipate efficient learning within small sample regimes through the introduction of biases that restrict our model class by constraining the set of admissible distributions.;We begin by selecting a set of low-dimensional distributions, which we call "primitives". These primitives typically contain very small numbers of variables (e.g. pairs and triplets) and can be reliably estimated from data even when samples are small. We define a set of merging rules to iteratively assemble these primitives into increasingly large compositions, which we use to specify a distribution over the whole set of variables. We define our model class as the set of all admissible distributions that can be reached through a sequence of merges, and we show that every distribution of this type is uniquely identified with a directed acyclic graph of primitives.;Each primitive has an associated "score", which is defined as the likelihood ratio corresponding to the gain incurred in fusing the individual variables into the primitive distribution relative to independent variables. The global score for any given composition decomposes into a sum of local scores, one for each participating primitive. Since all scores can be precomputed, parameter estimation (building primitives) is separated from model construction (competitive assembly). Furthermore, the structure search problem (i.e. optimizing over all decompositions for valid compositions that maximize the score) can be solved using integer linear programming.;We use the name Competitive Assembly of Marginals (CAM) to refer to models that are learned within this general framework. We present several subfamilies of CAM models that incorporate diverse structural and parametric constraints. We validate the advantages of our method for small samples using both synthetic and real data. In terms of practical applications, we illustrate how our models can be used to infer semantic networks from text and to reconstruct networks of molecular interactions in computational biology.
展开▼