首页> 外文学位 >Optimization and statistical estimation for the post randomization method.
【24h】

Optimization and statistical estimation for the post randomization method.

机译:后随机化方法的优化和统计估计。

获取原文
获取原文并翻译 | 示例

摘要

The field of Statistical Disclosure Control (SDC) aims at developing methodology that balances the objectives of providing data for valid statistical inference and safeguarding confidential information. One of the SDC methods for categorical variables is the Post Randomization Method (PRAM). The basic idea underlying PRAM is to misclassify values of the categorical variables, via a known probability mechanism captured by a PRAM matrix. This thesis focuses on three primary methodological developments that enable PRAM to become a more theoretically and practically viable SDC method.;First, we focus on the issue of obtaining valid statistical analysis with data subject to PRAM. The application of PRAM is known to produced biased parameter estimates in generalized linear models (GLMs). We develop and implement EM-type algorithms that take into account the effect of PRAM and obtain asymptotically unbiased estimators of parameters in GLMs, when both covariates and response variables are subject to PRAM. The basic ideas are based on the EM by method of weights" in the missing data literature. Second, we extend the proposed methodology in order to deal with dependent covariates when estimating parameters in GLMs by relaxing the assumption of independence of covariates. This is done by modeling the distribution of the covariates subject to PRAM as a product of univariate conditional distributions. This approach advances the PRAM methodology by making it more applicable in practice and results in more accurate estimators of the regression parameters. Results from simulation studies and application to the 1993 Current Population Survey are presented. Lastly, we address the issue of obtaining optimal PRAM matrices which produce safe files and maximize data utility with respect to a widely-used utility measure for PRAM: entropy-based information loss, EBIL, a variant of Shannon's entropy. We show that for a certain class of PRAM matrices, EBIL displays monotonic properties, which implies the minimum of EBIL occurs at an extreme point of the convex region that satisfies a pre-determined rule for safe files. Using these properties, we present an algorithm that obtains PRAM matrices which produce safe files with higher data utility when compared to PRAM matrices obtained using built-in numerical methods and routines.
机译:统计披露控制(SDC)领域旨在开发一种方法,以平衡提供有效统计推断数据和保护机密信息的目标。用于分类变量的SDC方法之一是后随机化方法(PRAM)。 PRAM的基本思想是通过PRAM矩阵捕获的已知概率机制对分类变量的值进行错误分类。本文主要关注三个使PRAM成为理论上和实践上更可行的SDC方法的方法学发展。首先,我们关注于使用PRAM数据获取有效统计分析的问题。众所周知,PRAM的应用可在广义线性模型(GLM)中产生偏差参数估计。当协变量和响应变量都受PRAM约束时,我们开发并实施考虑到PRAM效果的EM类型算法,并获得GLM中参数的渐近无偏估计量。基本思想基于缺失数据文献中的“通过权重法的EM”。其次,我们扩展了提出的方法,以便通过放宽协变量独立性的假设来估计GLM中的参数时处理因协变量。通过将服从PRAM的协变量的分布建模为单变量条件分布的乘积,该方法通过使PRAM方法在实践中更具适用性并改进了回归参数的估计,从而改进了PRAM方法。提出了1993年的当前人口调查,最后,我们解决了获得最佳PRAM矩阵的问题,该矩阵可以生成安全文件并使数据实用性最大化,这是针对PRAM广泛使用的实用性度量的:基于熵的信息丢失,EBIL,是Shannon's的变体我们证明,对于特定类别的PRAM矩阵,EBIL显示单调性质,这意味着EBIL的最小值发生在满足安全文件预定规则的凸区的极端。利用这些属性,我们提出了一种算法,该算法与使用内置数值方法和例程获得的PRAM矩阵相比,可以获取具有较高数据实用性的PRAM矩阵,从而生成具有更高数据实用性的文件。

著录项

  • 作者

    Woo, Young Ming Jeffrey.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 183 p.
  • 总页数 183
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号