首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Differentially Private Clustering in High-Dimensional Euclidean Spaces
【24h】

Differentially Private Clustering in High-Dimensional Euclidean Spaces

机译:高维欧氏空间中的差分私有聚类

获取原文
获取外文期刊封面目录资料

摘要

We study the problem of clustering sensitive data while preserving the privacy of individuals represented in the dataset, which has broad applications in practical machine learning and data analysis tasks. Although the problem has been widely studied in the context of low-dimensional, discrete spaces, much remains unknown concerning private clustering in high-dimensional Euclidean spaces $mathbb{R}^d$. In this work, we give differentially private and efficient algorithms achieving strong guarantees for $k$-means and $k$-median clustering when $d=Omega(mathsf{polylog}(n))$. Our algorithm achieves clustering loss at most $log^3(n)mathsf{OPT}+mathsf{poly}(log n,d,k)$, advancing the state-of-the-art result of $sqrt{d}mathsf{OPT}+mathsf{poly}(log n,d^d,k^d)$. We also study the case where the data points are $s$-sparse and show that the clustering loss can scale logarithmically with $d$, i.e., $log^3(n)mathsf{OPT}+mathsf{poly}(log n,log d,k,s)$. Experiments on both synthetic and real datasets verify the effectiveness of the proposed method.
机译:我们研究了在保护敏感数据的聚类问题的同时保留数据集中表示的个人的隐私的方法,它在实际的机器学习和数据分析任务中具有广泛的应用。尽管已经在低维,离散空间的背景下对该问题进行了广泛研究,但是关于高维欧几里德空间$ mathbb {R} ^ d $中的私有聚类仍然未知。在这项工作中,当$ d = Omega( mathsf {polylog}(n))$时,我们给出差分私有和有效算法,为$ k $ -means和$ k $-中位数聚类提供有力的保证。我们的算法最多可实现$ log ^ 3(n) mathsf {OPT} + mathsf {poly}( log n,d,k)$的聚类损失,从而提高了$ 的最新结果sqrt {d} mathsf {OPT} + mathsf {poly}( log n,d ^ d,k ^ d)$。我们还研究了数据点为$ s $稀疏的情况,并表明聚类损失可以用$ d $进行对数缩放,即$ log ^ 3(n) mathsf {OPT} + mathsf {poly} ( log n, log d,k,s)$。综合和真实数据集上的实验证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号