...
首页> 外文期刊>Journal of the American statistical association >Sampling With Synthesis: A New Approach for Releasing Public Use Census Microdata
【24h】

Sampling With Synthesis: A New Approach for Releasing Public Use Census Microdata

机译:综合抽样:发布公共用途人口普查微数据的新方法

获取原文
获取原文并翻译 | 示例
           

摘要

Many statistical agencies disseminate samples of census microdata, that is, data on individual records, to the public. Before releasing the microdata, agencies typically alter identifying or sensitive values to protect data subjects' confidentiality, for example by coarsening, perturbing, or swapping data. These standard disclosure limitation techniques distort relationships and distributional features in the original data, especially when applied with high intensity. Furthermore, it can be difficult for analysts of the masked public use data to adjust inferences for the effects of the disclosure limitation. Motivated by these shortcomings, we propose an approach to census microdata dissemination called sampling with synthesis. The basic idea is to replace the identifying or sensitive values in the census with multiple imputations, and release samples from these multiply-imputed populations. We demonstrate that sampling with synthesis can improve the quality of public use data relative to sampling followed by standard statistical disclosure limitation; simulation results showing this are available online as supplemental material. We derive methods for analyzing the multiple datasets generated by sampling with synthesis. We present algorithms for selecting which census values to synthesize based on considerations of disclosure risk and data utility. We illustrate sampling with synthesis on a population constructed with data from the U.S. Current Population Survey.
机译:许多统计机构向公众发布了人口普查微数据的样本,即个人记录中的数据。在发布微数据之前,代理商通常会更改标识或敏感值以保护数据主体的机密性,例如通过粗化,干扰或交换数据。这些标准公开限制技术会扭曲原始数据中的关系和分布特征,尤其是在高强度应用时。此外,被掩盖的公共使用数据的分析人员可能难以针对公开限制的影响来调整推论。由于这些缺点,我们提出了一种普查微数据传播的方法,称为综合采样。基本思想是用多个插补代替普查中的识别或敏感值,并从这些多重插补群体中释放样本。我们证明,综合抽样可以提高公共使用数据的质量(相对于抽样,其次是标准的统计披露限制);仿真结果表明,可以作为补充材料在线获得。我们推导了用于分析通过综合采样生成的多个数据集的方法。我们提供了基于披露风险和数据实用性的考虑选择用于合成普查值的算法。我们举例说明了使用美国当前人口调查数据构建的总体样本进行综合抽样的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号