首页> 外文学位 >Sampling contingency tables given sets of marginals and/or conditionals in the context of statistical disclosure limitation.
【24h】

Sampling contingency tables given sets of marginals and/or conditionals in the context of statistical disclosure limitation.

机译:在统计披露限制的情况下,给定的边际和/或条件集抽样列联表。

获取原文
获取原文并翻译 | 示例

摘要

Federal agencies and other organizations often publish data summarized in arrays of non-negative integers, called contingency tables. When such data are released, it is necessary to prevent sensitive information pertaining to individuals from being disclosed. In statistical disclosure limitation, we must maintain a balance between disclosure risk and the data utility needed to make valid statistical inferences. One method for achieving this balance is to release partial information about the original data. In practice, many agencies release data summarized in the form of marginal sums or conditional probabilities. Sampling methods for multi-way contingency tables given a set of observed marginal sums have been studied in diverse ways; yet, there is almost no literature about sampling of tables given a set of observed conditional probabilities. In this thesis, we focus on a set of conditional probabilities instead of marginal sums. We propose MCMC simulation schemes coupled with tools from algebraic statistics to sample tables from the sets of possible tables given observed conditional values. We also propose a simple extension to the case given a combination of observed marginal totals and conditional values. These algorithms can be used to compute posterior distribution and assess data utility and disclosure risk in the context of statistical disclosure limitation. We demonstrate the proposed algorithms with simple examples and discuss their advantages and disadvantages. In addition, proposed sampling algorithms can be used for releasing synthetic contingency tables. We study both the disclosure risk and data utility associated with proposed synthetic tabular data releases.
机译:联邦机构和其他组织经常发布汇总为非负整数数组的数据,称为列联表。当发布此类数据时,有必要防止与个人有关的敏感信息被泄露。在统计披露限制中,我们必须在披露风险与进行有效统计推断所需的数据实用程序之间保持平衡。实现这种平衡的一种方法是释放有关原始数据的部分信息。实际上,许多机构以边际和或有条件概率的形式发布汇总的数据。给定一组观察到的边际总和的多向列联表的抽样方法已经以多种方式进行了研究。但是,在给定观察到的条件概率的情况下,几乎没有关于表采样的文献。在本文中,我们关注于一组条件概率,而不是边际和。我们提出了MCMC模拟方案,并结合了代数统计工具,从给定观测条件值的可能表集中采样表。考虑到边际总数和条件值的组合,我们还建议对情况进行简单扩展。这些算法可用于计算后验分布,并在统计披露限制的情况下评估数据效用和披露风险。我们用简单的示例演示提出的算法,并讨论它们的优缺点。此外,建议的采样算法可用于发布综合列联表。我们研究与建议的合成表格数据发布相关的披露风险和数据实用性。

著录项

  • 作者

    Lee, Juyoun.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 210 p.
  • 总页数 210
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号