Disclosure Risk Evaluation for Fully Synthetic Categorical Data

机译：全面综合分类数据的披露风险评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the synthetic data and assumptions about intruder knowledge. We use a "worst-case" scenario of an intruder knowing all but one of the records in the confidential data. To create the synthetic data, we use a Dirichlet process mixture of products of multinomial distributions, which is a Bayesian version of a latent class model. In addition to generating synthetic data with high utility, the likelihood function admits simple and convenient approximations to the disclosure risk probabilities via importance sampling. We illustrate the disclosure risk computations by synthesizing a subset of data from the American Community Survey.

机译：我们提出了一种评估完全合成的分类数据的披露风险的方法。基本思想是给定综合数据和有关入侵者知识的假设，计算未知机密数据值的概率分布。我们使用“最坏情况”的情形，即入侵者知道机密数据中除一条记录以外的所有记录。为了创建综合数据，我们使用多项式分布乘积的Dirichlet过程混合，这是潜在类模型的贝叶斯版本。除了生成具有高实用性的合成数据外，似然函数还通过重要性抽样方法对披露风险概率进行简单方便的近似计算。我们通过综合来自美国社区调查的数据子集来说明披露风险的计算。

著录项

来源
《UNESCO chair in data privacy international conference on privacy in statistical databases》|2014年|185-199|共15页
会议地点
作者
Jingchen Hu; Jerome P. Reiter; Quanli Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bayesian; confidentiality; Dirichlet process; disclosure; microdata;

机译：贝叶斯保密; Dirichlet过程;披露;微数据;

相似文献

外文文献
中文文献
专利

1. Assessing disclosure risks for synthetic data with arbitrary intruder knowledge [J] . David McClure, Jerome P. Reiter Statistical Journal of the IAOS: Journal of the International Association for Official Statistics . 2016,第1期

机译：利用任意入侵者知识评估合成数据的披露风险
2. Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing [J] . Quick Harrison, Holan Scott H., Wikle Christopher K. Journal of the Royal Statistical Society . 2018,第pta3期

机译：通过使用差分平滑来生成披露风险降低的部分合成地理编码公共用途数据
3. Multimethod synthetic data generation for confidentiality and measurement of disclosure risk [J] . Michael D. Larsen, Jennifer C. Huckett International journal of information privacy, security and integrity . 2012,第2a3期

机译：多方法合成数据生成，用于保密和衡量披露风险
4. Accounting for Intruder Uncertainty Due toSampling When Estimating Identification Disclosure Risks in Partially Synthetic Data [C] . Joerg Drechsler, Jerome P. Reiter Privacy in Statistical Databases . 2008

机译：在估计部分合成数据中的标识披露风险时，应考虑抽样导致的入侵者不确定性
5. Novel Approaches to Creating Synthetic Data from Multivariate Survey Data for Statistical Disclosure Control [D] . Chen, Allshine. 2020

机译：从多变量调查数据创建合成数据的新方法进行统计泄露控制
6. Daily activity locations k-anonymity for the evaluation of disclosure risk of individual GPS datasets [O] . Jue Wang, Mei-Po Kwan 2020

机译：日常活动位置k-匿名性用于评估单个GPS数据集的披露风险
7. Bayesian Estimation of Disclosure Risks for Multiply Imputed, Synthetic Data [O] . Reiter, Jerome P, Wang, Quanli, Zhang, Biyuan 2014

机译：多重估算的综合数据的披露风险的贝叶斯估计
8. Methodology for Identifying, Evaluating, and Controlling the Incremental Risk of Inadvertent Sensitive Information Disclosure During on-Site Verification Inspections at US Facilities. [R] . Swindle, D. W., Brenner, L. M. 1989

机译：在美国工厂进行现场验证检查时，识别，评估和控制无意敏感信息泄露增量风险的方法。

Disclosure Risk Evaluation for Fully Synthetic Categorical Data

摘要

著录项

相似文献

相关主题

期刊订阅