首页> 外文会议>Annual IEEE India Conference >Providing anonymity using top down specialization on Big Data using hadoop framework
【24h】

Providing anonymity using top down specialization on Big Data using hadoop framework

机译:使用hadoop框架使用大数据的自上而下的专业化提供匿名性

获取原文

摘要

Sharing of data has become a new trend among internet users which has lead to issue of privacy. Hence to provide privacy to user's data anonymity is provided. Data anonymity is a process of hiding sensitive information which is responsible for breach of privacy. The number of users sharing data has increased tremendously which has to lead to generation of huge data. This huge data that cannot be managed by normal system and software is termed as ???Big data???. Existing system and anonymity approaches fail to handle Big Data efficiently. Big data handling is complex issue as to perform any operation system must be capable of manipulating such huge data in acceptable time. System must be highly scalable. This paper provides anonymity for Big Data in a highly scalable fashion. It makes use of MapReduce framework which gains scalability by job level and task level parallelization. K-anonymity is used to provide anonymity which generalises the data. Job level parallelization refers to running of multiple MapReduce jobs simultaneously. Task level parallelization refers to running of multiple mapper/reducer over data splits. To make full use of parallel computation anonymization process is split into two phases. In first phase the huge data set is portioned into small data sets and anonymity is provided. But the data obtained is inconsistent hence second phase is executed which merges this anonymous data into one single huge data set. This paper accomplishes specialization computation in highly scalable manner.
机译:数据共享已成为互联网用户之间的一种新趋势,这导致了隐私问题。因此,提供隐私给用户的数据匿名。数据匿名是隐藏敏感信息的过程,该信息负责破坏隐私。共享数据的用户数量已大大增加,这导致生成大量数据。普通系统和软件无法管理的海量数据称为“大数据”。现有的系统和匿名方法无法有效地处理大数据。大数据处理是一个复杂的问题,因为执行任何操作系统都必须能够在可接受的时间内处理如此大的数据。系统必须具有高度的可扩展性。本文以高度可扩展的方式提供了大数据的匿名性。它利用了MapReduce框架,该框架通过作业级别和任务级别并行化获得了可伸缩性。 K-匿名性用于提供可概括数据的匿名性。作业级别并行化是指同时运行多个MapReduce作业。任务级并行化是指在数据拆分上运行多个映射器/缩减器。为了充分利用并行计算,匿名化过程分为两个阶段。在第一阶段,将庞大的数据集分成小数据集,并提供匿名性。但是获得的数据不一致,因此执行第二阶段,将匿名数据合并为一个单一的巨大数据集。本文以高度可扩展的方式完成了专业化计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号