首页> 外文期刊>IEEE transactions on dependable and secure computing >Differentially Private $k$k-Means Clustering With Convergence Guarantee
【24h】

Differentially Private $k$k-Means Clustering With Convergence Guarantee

机译:差异私有 $ k $ <替代方案> k - 与收敛保证的群集

获取原文
获取原文并翻译 | 示例

摘要

Iterative clustering around representative points is an effective technique for clustering and helps us learn insights behind data to support various important applications. Unfortunately, it also provides security holes which may allow adversaries to infer the privacy of individuals with some background knowledge. To protect individual privacy against such inference attacks, preserving differential privacy for iterative clustering algorithms has been extensively studied. Existing differentially private clustering algorithms adopt the same framework to compute differentially private centroids iteratively by running Lloyd's k-means algorithm to obtain the actual centroids, then perturbing them with a differential privacy mechanism. These algorithms suffer from the problem of no convergence guarantee, i.e., they provide no guarantee of termination at a solution of Lloyd's algorithm within a bounded number of iterations. This problem severely impacts their clustering quality and execution efficiency. To address this problem, this article follows the same centroid updating pattern as existing work in interactive settings; however we propose a novel framework for injecting differential privacy into the actual centroids. Specifically, to ensure convergence, we maintain the perturbed centroids of the previous iteration t-1 to compute a convergence zone for each cluster in the current iteration t, where we inject differential privacy noise. To achieve a satisfactory convergence rate, we further control the orientation of centroid movement in each cluster using two strategies: one takes the orientation of centroid movement from iteration t - 1 to iteration (past knowledge); the other uses the additional information of the orientation from iteration to iteration t + 1 (future knowledge). We prove that, in the expected case, our algorithm (in both strategies) converges to a solution of Lloyd's algorithm in at most twice as many iterations as Lloyd's algorithm. Furthermore, when using both past and future knowledge, we prove that our algorithm converges to the same solution as Lloyd's algorithm (for the same initial centroids) with high probability, at the cost of a slower convergence speed compared to using only past knowledge due to duplicated operations in each iteration required for computing the future knowledge. We perform experimental evaluations on seven widely used real-world datasets. The experimental results show that our algorithm outperforms the state-of-the-art methods for interactive differentially private clustering with a guaranteed convergence and better clustering quality whilst meeting the same differential privacy requirements.
机译:围绕代表点的迭代聚类是群集的有效技术,并帮助我们学习数据背后的见解,以支持各种重要应用。不幸的是,它还提供安全漏洞,这可能让对手推断个人的隐私与一些背景知识。为了保护个人隐私,针对这种推理攻击,已经广泛研究了对迭代聚类算法的差异隐私。通过运行LLOYD的K-means算法获取实际质心来迭代地采用相同的差异群集算法来计算相同的框架来计算差分私心电图,然后用差分隐私机制扰乱它们。这些算法遭受了没有收敛保证的问题,即,它们在Lloyd算法的解决方案中没有保证在有界迭代中的终止状态下。这个问题严重影响了他们的聚类质量和执行效率。要解决此问题,本文遵循与交互式设置中现有工作相同的质心更新模式;但是,我们提出了一种将差异隐私注入实际质心的新颖框架。具体而言,为了确保收敛,我们维持先前迭代T-1的扰动质心,以计算当前迭代T中的每个簇的收敛区,在那里我们注入差异隐私噪声。为了实现令人满意的会聚率,我们进一步使用两种策略控制每个群体中质心运动的方向:一个策略从迭代T - 1到迭代的质心运动的定向(过去知识);另一个使用从迭代到迭代T + 1(未来知识)的附加信息。我们证明,在预期的情况下,我们的算法(两种策略中)会聚到LLOYD算法的解决方案,最多是LLOYD算法的两倍多的迭代。此外,当使用过去和未来的知识时,我们证明我们的算法将与LLOYD的算法(对于相同的初始心质)收敛到具有高概率的相同的解决方案,以较慢的收敛速度,与仅使用过去的知识相比,较慢的收敛速度。计算未来知识所需的每次迭代中的重复操作。我们对七种广泛使用的真实数据集进行实验评估。实验结果表明,我们的算法优于互动差异私有聚类的最先进的方法,以保证收敛和更好的聚类质量,同时满足相同的差异隐私要求。

著录项

相似文献

  • 外文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号