首页> 外文学位 >Using multiply imputed, synthetic data to facilitate data sharing.
【24h】

Using multiply imputed, synthetic data to facilitate data sharing.

机译:使用多重估算的合成数据以促进数据共享。

获取原文
获取原文并翻译 | 示例

摘要

The collection of data by statistical agencies and other statistical organizations for internal use and public release is a complex process. Researchers and policy makers demand high quality public-use data, while agency concerns regarding confidentiality and respondent protection limit the information that can be released. Even the sharing of data between statistical agencies cannot be clone without first protecting the data in question. Advances in computer technology pose a threat to data confidentiality because data intruders are equipped with tools and resources that can be used to link public records with released data. Therefore, to limit disclosures, agencies apply disclosure control techniques to their data prior to release to ensure that respondent information is protected. However, the application of such techniques reduces the utility of the released data. The requirements of agencies to safeguard their data from disclosures limit their ability to share and exchange unperturbed data with one another. Even in situations where agencies desire to work in an honest environment and the exchange of data would benefit agencies and the researchers who study public-use data, data sharing is limited.;One approach agencies can use to safely share their data and create public-use data in the process, is to exchange synthetic data rather than real data. If the agencies have mutual interests, then it may be advantageous for them to create a combined data set that is accessible to all contributing agencies. This combined data set would give agencies and public-use data users the ability to incorporate additional records or attributes into their analyses than previously available from the individual data sources. To facilitate the sharing of confidential data between agencies, synthetic data methods are used to create multiply imputed, synthetic data sets that can be shared among participating agencies. Inferential methods for combining data sets from multiple sources are derived and then validated based on simulation studies that utilize several different analysis models. Implementation of the proposed data sharing methods on real data requires creativity and an inherent understanding of the data to maintain both the overall structure of the data and the underlying relationships.
机译:统计机构和其他统计组织收集数据以供内部使用和公开发布是一个复杂的过程。研究人员和政策制定者需要高质量的公共用途数据,而机构对机密性和响应者保护的关注限制了可以发布的信息。在没有首先保护相关数据的情况下,甚至无法复制统计机构之间的数据共享。计算机技术的进步对数据保密性构成了威胁,因为数据入侵者配备了可用于将公共记录与已发布数据链接的工具和资源。因此,为了限制披露,代理机构在发布之前对其数据应用了披露控制技术,以确保对响应者信息进行保护。但是,此类技术的应用降低了已发布数据的实​​用性。机构要求保护其数据免遭泄露的要求限制了它们彼此共享和交换不受干扰的数据的能力。即使在机构希望在诚实的环境中工作并且数据交换会使机构和研究公用数据的研究人员受益的情况下,数据共享也受到限制。;机构可以用来安全地共享其数据并创建公用程序的一种方法在使用数据的过程中,是交换合成数据而不是真实数据。如果各机构具有共同利益,那么创建对所有贡献机构均可访问的组合数据集可能对他们有利。这种组合的数据集将使代理机构和公用数据用户能够将比以前从单个数据源可获得的其他记录或属性合并到其分析中。为了促进各机构之间共享机密数据,使用了综合数据方法来创建可在参与机构之间共享的多个估算的综合数据集。得出了用于组合来自多个来源的数据集的推论方法,然后基于利用几种不同分析模型的模拟研究进行了验证。在实际数据上实施建议的数据共享方法需要创造性和对数据的内在理解,以维持数据的整体结构和基础关系。

著录项

  • 作者

    Kohnen, Christine Noelle.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号