首页> 外文学位 >Privacy preserving distributed data mining based on multi-objective optimization and algorithmic game theory.
【24h】

Privacy preserving distributed data mining based on multi-objective optimization and algorithmic game theory.

机译:基于多目标优化和算法博弈论的隐私保护分布式数据挖掘。

获取原文
获取原文并翻译 | 示例

摘要

Use of technology for data collection and analysis has seen an unprecedented growth in the last couple of decades. Individuals and organizations generate huge amount of data through everyday activities. This data is either centralized for pattern identification or mined in a distributed fashion for efficient knowledge discovery and collaborative computation. This, obviously, has raised serious concerns about privacy issues. The data mining community has responded to this challenge by developing a new breed of algorithms that are privacy preserving. Specifically, cryptographic techniques for secure multi-party function evaluation form the class of privacy preserving data mining algorithms for distributed computation environments. However, these algorithms require all participants in the distributed system to follow a monolithic privacy model and also make strong assumptions about the behavior of participating entities. These conditions do not necessarily hold true in practice. Therefore, most of the existing work in privacy preserving distributed data mining fail to serve the purpose when applied to large real-world distributed data mining applications.In this dissertation we develop a novel framework for privacy preserving distributed data mining that allows personalization of privacy requirements for individuals in a large distributed system and removes certain assumptions regarding participant behavior, thereby making the framework efficient and real-world adaptable.First, we propose the idea of personalized privacy for individuals in a large distributed system based on the fact that privacy is a social concept. Different parties in a distributed computing environment have varied privacy requirements for their data, and also varying availability of computation and communication resources. Therefore, we model privacy as amulti-objective optimization function where each party attempts to find the optimal choice between two conflicting objectives---(i) maximizing the data privacy, and (ii) minimizing the cost associated with the privacy guarantee. Each party optimizes its own objective to define the privacy model parameter that satisfies its privacy and cost requirements and then participates in the collaborative computation.Secondly, to address the issue of assumptions regarding user behavior in cryptography-based privacy preservation techniques, we formulate privacy preserving distributed data mining as a game. The participating entities are the players of the game and the strategies they adopt in communicating their data, doing necessary computations and attacking others data to reveal personal information, decide the result of the game in terms of the quality of the data mining results. Knowing that, in the absence of a supervisor, the tendency of any player in this game would be to cheat, we design a penalizing mechanism and blend it with the distributed data mining algorithm for getting a self-correcting system that forces parties to follow the protocol and not cheat.The framework that we have proposed is independent of the choice of the privacy model for the distributed computation and also applicable to any privacy preserving data mining application involving multi-party function evaluation in a distributed environment. To demonstrate the working of our framework, we have adapted it to work for some real life distributed data mining applications such as web advertisement ranking, distributed feature selection, and online similarity identification in browsing patterns. We have designed mechanisms for privacy preserving sum computation and inner product computation in a distributed environment and adapted the framework to work for Bayes optimal model of privacy and epsilon-differential privacy model. We have simulated the working of the distributed applications and presented experimental results for each of the algorithms developed, using the Distributed Data Mining Toolkit (DDMT) developed by the DIADIC laboratory at UMBC.
机译:在过去的几十年中,使用技术进行数据收集和分析的空前增长。个人和组织通过日常活动生成大量数据。此数据可以集中进行模式识别,也可以以分布式方式进行挖掘以进行有效的知识发现和协作计算。显然,这引起了人们对隐私问题的严重关注。数据挖掘社区通过开发一种新型的可保护隐私的算法来应对这一挑战。具体来说,用于安全多方功能评估的密码技术构成了用于分布式计算环境的隐私保护数据挖掘算法的一类。但是,这些算法要求分布式系统中的所有参与者遵循整体式隐私模型,并且还对参与实体的行为做出了强有力的假设。这些条件在实践中不一定成立。因此,当应用于大型的现实世界中的分布式数据挖掘应用程序时,大多数现有的隐私保护分布式数据挖掘工作都无法达到目的。本论文我们开发了一种新颖的隐私保护分布式数据挖掘框架,该框架允许个性化隐私需求针对大型分布式系统中的个人,并消除了有关参与者行为的某些假设,从而使该框架高效且适用于现实世界。首先,基于隐私是一种事实,我们提出了针对大型分布式系统中的个人的个性化隐私的想法。社会观念。分布式计算环境中的不同方对其数据的隐私要求有所不同,并且计算和通信资源的可用性也有所不同。因此,我们将隐私建模为一种多目标优化功能,其中各方试图在两个冲突的目标之间找到最佳选择-(i)最大化数据隐私,以及(ii)最小化与隐私保证相关的成本。各方优化自己的目标以定义满足其隐私和成本要求的隐私模型参数,然后参与协作计算。其次,为了解决基于密码的隐私保护技术中有关用户行为的假设问题,我们制定了隐私保护分布式数据挖掘游戏。参与实体是游戏的参与者,是他们在传达数据,进行必要的计算并攻击其他数据以泄露个人信息,根据数据挖掘结果的质量决定游戏结果时所采用的策略。知道在没有监督者的情况下,该游戏中任何玩家的倾向都是作弊,因此我们设计了一种惩罚机制,并将其与分布式数据挖掘算法相结合,以获得一种自我纠正系统,迫使各方遵循该规则。我们提出的框架不依赖于用于分布式计算的隐私模型的选择,也适用于任何涉及分布式环境中涉及多方功能评估的隐私保护数据挖掘应用程序。为了演示我们框架的工作原理,我们对其进行了调整,使其可以用于某些现实生活中的分布式数据挖掘应用程序,例如Web广告排名,分布式功能选择以及浏览模式中的在线相似性识别。我们已经设计了用于在分布式环境中保护和计算和内部乘积计算的隐私的机制,并使该框架适用于贝叶斯隐私的最佳模型和epsilon差分隐私模型。我们使用UMBC的DIADIC实验室开发的分布式数据挖掘工具包(DDMT),对分布式应用程序的工作进行了仿真,并给出了每种算法的实验结果。

著录项

  • 作者

    Das, Kamalika.;

  • 作者单位

    University of Maryland, Baltimore County.;

  • 授予单位 University of Maryland, Baltimore County.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 255 p.
  • 总页数 255
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:38:06

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号