We study the problem of estimating a set of d linear queries with respect to some unknown distribution p over a domain $[J]$ based on a sensitive data set of n individuals under the constraint of local differential privacy. This problem subsumes a wide range of estimation tasks, e.g., distribution estimation and d-dimensional mean estimation. We provide new algorithms for both the offline (non-adaptive) and adaptive versions of this problem. In the offline setting, the set of queries are fixed before the algorithm starts. In the regime where $n < d^2/log(J)$, our algorithms attain $L_2$ estimation error that is independent of d. For the special case of distribution estimation, we show that projecting the output estimate of an algorithm due to [Acharya et al. 2018] on the probability simplex yields an $L_2$ error that depends only sub-logarithmically on $J$ in the regime where $n < J^2/log(J)$. Our bounds are within a factor of at most $(log(J))^{1/4}$ from the optimal $L_2$ error. These results show the possibility of accurate estimation of linear queries in the high-dimensional settings under the $L_2$ error criterion. In the adaptive setting, the queries are generated over d rounds; one query at a time. In each round, a query can be chosen adaptively based on all the history of previous queries and answers. We give an algorithm for this problem with optimal $L_{infty}$ estimation error (worst error in the estimated values for the queries w.r.t. the data distribution). Our bound matches a lower bound on the $L_{infty}$ error in the offline version of this problem [Duchi et al. 2013].
展开▼
机译:我们研究了在局部差分隐私约束下,基于n个人的敏感数据集,针对域$ [J] $上的某些未知分布p估计一组d线性查询的问题。该问题包括各种各样的估计任务,例如分布估计和d维平均估计。我们为该问题的离线(非自适应)和自适应版本提供了新算法。在脱机设置中,查询集在算法开始之前是固定的。在$ n 展开▼