【24h】

A sanitization approach for big data with improved data utility

机译:改进数据实用程序的大数据的消毒方法

获取原文
获取原文并翻译 | 示例
           

摘要

The process of collaborative data mining may sometimes expose the sensitive patterns present inside the data which may be undesirable to the data owner. Sensitive Pattern Hiding (SPH) is a subfield of data mining that addresses this problem. However, most of the existing approaches used for hiding sensitive patterns cause high side-effect on non-sensitive patterns which in-turn reduces the utility of the sanitized dataset. Furthermore, most of them are sequential in nature and are not able to cope with massive amounts of data and often results in high execution time. To resolve these identified challenges of utility and non-feasibility, two parallelized approaches have been proposed named PGVIR and PHCR based on spark parallel computing framework which modifies the data such that no sensitive patterns can be extracted while maintaining the utility of the sanitized dataset. Experiments performed using benchmark dataset shows that PGVIR scales better and PHCR causes fewer side-effects to the data compared to the existing techniques.
机译:协同数据挖掘的过程有时可能暴露在数据所有者中可能不期望的数据内的敏感模式。敏感图案隐藏(SPH)是解决此问题的数据挖掘的子字段。然而,用于隐藏敏感图案的大多数现有方法导致对非敏感模式的高副作用,从而减少了消毒数据集的效用。此外,大多数在性质上是连续的,并且无法应对大量数据,并且通常会导致高执行时间。为了解决这些识别的实用性和不可行性的挑战,已经提出了基于火花并行计算框架的PGVIR和PHCR提出了两个并行化方法,该PGVIR计算框架修改了数据,使得可以在维护消毒数据集的实用程序的同时没有提取敏感模式。使用基准数据集执行的实验表明,与现有技术相比,PGVIR尺度更好,PHCR导致数据较少副作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号