Disseminating information from a k-way cross-classification of non-negative counts n typically corresponds to the release of various lower order marginals, or equivalently subsets of the k variables. This thesis exploits the theory of graphical models to characterize classes of tables induced by several possibly overlapping marginals. Any set of released marginals can be used to define an independence graph. In the special case when these marginals correspond to the minimal sufficient statistics of a decomposable graphical model, the theory yields explicit formulas for sharp upper and lower bounds on the cell entries of tables in . In addition, these bound results are related to the Markov basis used to induce probability distributions over . In the decomposable case, simple “data swaps” or moves are the only moves required to construct a Markov basis that links all the contingency tables in . The approach for computing bounds and for generating Markov bases developed in the thesis generalizes to the case when the released marginals correspond to a reducible independence graph. For an arbitrary set of released marginals, explicit formulas for sharp bounds are not available, and some form of iterative algorithm is required. The method developed in this thesis is a generalization of the shuttle algorithm proposed by Buzzigoli and Giusti. This generalized shuttle algorithm can be modified to enumerate all the tables in the class and to find a controlled rounding of a table of arbitrary dimension. The last part of this thesis studies probability distribution functions defined on spaces of tables induced by a set of marginal totals. Through examples and discussion the thesis illustrates the practical values of the bound and distribution results for assessing the disclosure risk for categorical data.
展开▼
机译:从非负计数 n bold>的 k italic>交叉分类中传播信息通常对应于释放各种较低阶边际或 k的等效子集 italic>变量。本文利用图形化模型的理论来刻画由几个可能重叠的边际引起的表展开▼