首页> 外文会议>IEEE Conference on Computer Communications >Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge
【24h】

Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge

机译:校准:通过结合先验知识,在具有本地差分隐私的情况下进行频率估计和严重击打识别

获取原文

摘要

Estimating frequencies of certain items among a population is a basic step in data analytics, which enables more advanced data analytics (e.g., heavy hitter identification, frequent pattern mining), client software optimization, and detecting unwanted or malicious hijacking of user settings in browsers. Frequency estimation and heavy hitter identification with local differential privacy (LDP) protect user privacy as well as the data collector. Existing LDP algorithms cannot leverage 1) prior knowledge about the noise in the estimated item frequencies and 2) prior knowledge about the true item frequencies. As a result, they achieve suboptimal performance in practice. In this work, we aim to design LDP algorithms that can leverage such prior knowledge. Specifically, we design Calibrate to incorporate the prior knowledge via statistical inference. Calibrate can be appended to an existing LDP algorithm to reduce its estimation errors. We model the prior knowledge about the noise and the true item frequencies as two probability distributions, respectively. Given the two probability distributions and an estimated frequency of an item produced by an existing LDP algorithm, our Calibrate computes the conditional probability distribution of the item's frequency and uses the mean of the conditional probability distribution as the calibrated frequency for the item. It is challenging to estimate the two probability distributions due to data sparsity. We address the challenge via integrating techniques from statistics and machine learning. Our empirical results on two real-world datasets show that Calibrate significantly outperforms state-of-the-art LDP algorithms for frequency estimation and heavy hitter identification.
机译:估计总体中某些项目的频率是数据分析的基本步骤,它可以进行更高级的数据分析(例如,重击球手识别,频繁模式挖掘),客户端软件优化,以及检测浏览器中用户设置的有害或恶意劫持。频率估计和具有本地差分隐私(LDP)的沉重击球手标识功能可以保护用户隐私以及数据收集器。现有的LDP算法不能利用1)关于估计的项目频率中的噪声的先验知识和2)关于真实的项目频率的先验知识。结果,它们在实践中达到了次优的性能。在这项工作中,我们旨在设计可以利用这些先验知识的LDP算法。具体来说,我们设计Calibrate来通过统计推断合并现有知识。可以将Calibrate附加到现有的LDP算法中,以减少其估计误差。我们将关于噪声和真实项目频率的先验知识分别建模为两个概率分布。给定两个概率分布和现有LDP算法产生的物品的估计频率,我们的Calibrate计算物品频率的条件概率分布,并使用条件概率分布的平均值作为物品的校准频率。由于数据稀疏性,估计两个概率分布具有挑战性。我们通过整合来自统计和机器学习的技术来应对挑战。我们在两个真实数据集上的经验结果表明,Calibrate在频率估计和重击球者识别方面明显优于最新的LDP算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号