首页> 外文会议>Annual international conference on the theory and applications of cryptographic techniques >Differential Privacy and the People's Data IACR Distinguished Lecture
【24h】

Differential Privacy and the People's Data IACR Distinguished Lecture

机译:差异隐私与IACR人民数据杰出演讲

获取原文

摘要

Differential Privacy will be the confidentiality protection method of the 2020 US Decennial Census. We explore the technical and social challenges to be faced as the technology moves from the realm of information specialists to the large community of consumers of census data. Differential Privacy is a definition of privacy tailored to the statistical analysis of large datasets. Roughly speaking, differential privacy ensures that anything leamable about an individual could be learned independent of whether the individual opts in or opts out of the data set under analysis. The term has come to denote a field of study, inspired by cryptography and guided by theoretical lower bounds and impossibility results, comprising algorithms, complexity results, sample complexity, definitional relaxations, and uses of differential privacy when privacy is not itself a concern. From its inception, a motivating scenario for differential privacy has been the US Census: data of the people, analyzed for the benefit of the people, to allocate the people's resources (hundreds of billions of dollars), with a legal mandate for privacy. Over the past 4-5 years, differential privacy has been adopted in a number of industrial settings by Google, Microsoft, Uber, and, with the most fanfare, by Apple. In 2020 it will be the confidentiality protection method for the US Decennial Census. Census data are used throughout government and in thousands of research studies every year. This mainstreaming of differential privacy, the transition from the realm of technically sophisticated information specialists and analysts into much broader use, presents enormous technical and social challenges. The Fundamental Theorem of Information Reconstruction tells us that overly accurate estimates of too many statistics completely destroys privacy. Differential privacy provides a measure of privacy loss that permits the tracking and control of cumulative privacy loss as data are analyzed and re-analyzed. But provably no method can permit the data to be explored without bound. How will the privacy loss "budget" be allocated? Who will enforce limits? More pressing for the scientific community are questions of how the multitudes of census data consumers will interact with the data moving forward. The Decennial Census is simple, and the tabulations can be handled well with existing technology. In contrast, the annual American Community Survey, which covers only a few million households yearly, is rich in personal details on subjects from internet access in the home to employment to ethnicity, relationships among persons in the home, and fertility. We are not (yet?) able to otter differentially private algorithms for every kind of analysis carried out on these data. Historically, confidentiality has been handled by a combination of data summaries, restricted use access to the raw data, and the release of public-use microdata, a form of noisy individual records. Summary statistics are the bread and butter of differential privacy, but giving even trusted and trustworthy researchers access to raw data is problematic, as their published findings are a vector for privacy loss: think of the researcher as an arbitrary non-differentially private algorithm that produces outputs in the form of published findings. The very choice of statistic to be published is inherently not privacy-preserving! At the same time, past microdata noising techniques can no longer be considered to provide adequate privacy, but generating synthetic public-use microdata while ensuring differential privacy is a computationally hard problem. Nonetheless, combinations of exciting new techniques give reason for optimism.
机译:差异隐私将是2020年美国十年期人口普查的机密保护方法。我们探索随着技术从信息专家的领域向普查数据的广大消费者群体转变而面临的技术和社会挑战。差异隐私是针对大型数据集的统计分析量身定制的隐私定义。粗略地说,差异隐私确保了可以学习与个人有关的任何信息,而与个人选择加入还是退出分析的数据集无关。该术语已表示受密码学启发并受到理论下界和不可能结果的指导的研究领域,包括算法,复杂性结果,样本复杂性,定义松弛以及在隐私本身不成问题时使用差异性隐私。从一开始,就已经有一个促进差异性隐私的激励方案,即美国人口普查:为人民的利益而进行分析的人民数据,要分配人民的资源(数千亿美元),并具有法定的隐私权。在过去的4-5年中,谷歌,微软,优步以及许多苹果公司在许多工业环境中都采用了差异隐私。到2020年,它将成为美国十年期人口普查的机密性保护方法。人口普查数据在整个政府和每年的数千项研究中都得到使用。这种差异化隐私的主流化,从技术精湛的信息专家和分析人员到更广泛使用的过渡,带来了巨大的技术和社会挑战。信息重构的基本定理告诉我们,对太多统计数据的过于准确的估计会完全破坏隐私。差异隐私提供了一种隐私丢失措施,可以在分析和重新分析数据时跟踪和控制累积的隐私丢失。但事实证明,没有任何一种方法可以无限制地探索数据。如何分配隐私损失“预算”?谁将执行限制?对于科学界来说,更迫切的问题是众多人口普查数据消费者将如何与前进的数据进行交互。十年一次的人口普查很简单,使用现有技术可以很好地处理列表。相比之下,年度美国社区调查每年仅覆盖数百万个家庭,其中包含有关个人细节的内容,涉及从家庭中的互联网访问到就业到种族,家庭中人与人之间的关系以及生育率等方面。对于这些数据进行的每种分析,我们还无法提供差分私有算法。从历史上看,机密性是通过数据摘要,限制使用原始数据的访问权限以及释放公共用途的微数据(一种嘈杂的个人记录)的组合来处理的。摘要统计数据是差异隐私的基础,但即使是受信任和值得信赖的研究人员也无法访问原始数据,这是有问题的,因为他们发表的发现是造成隐私丧失的一个载体:将研究人员视为任意的非差异私有算法,可以产生以已发表的调查结果的形式输出。统计信息的选择本质上不是保护隐私!同时,过去的微数据噪声处理技术不再被认为可以提供足够的隐私,但是在确保差异性隐私的同时生成合成的公共用途微数据是一个计算难题。尽管如此,令人兴奋的新技术的结合还是让人们感到乐观。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号