首页> 外文会议>International Conference on Service Operations and Logistics, and Informatics >Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes
【24h】

Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes

机译:具有多个敏感属性的水平和垂直划分的用户配置文件数据库的匿名化和分析

获取原文

摘要

Preventing the identification of individuals is important when data analyzers have to guarantee the safety of the data analysis they work with. A method proposed to solve this problem entails altering a part of the data value or deleting it. As to the processes, attributes of the individual data are divided into three groups: identifier (ID), quasi-identifier (QID), and sensitive attribute (SA). ID is the data that identify an individual directly, such as name. QID is the attributes that could identify an individual by combining them, such as age and birthplace. SA is very important information and should not be exposed when the data is identified to an individual. Utilizing these concepts, a safety metric for the data, such as l-diversity, is proposed so far. Under l-diversity, we use the assumption that the SA value is not known for anyone, and we process the data to prevent attackers from identifying. However, there are scenarios in which existing methods cannot protect the data against an invasion of privacy. In an analysis completed by multiple organizations, they integrated their data to carry out the effective data research. Although they can obtain profitable results, the integrated data could include information that attackers use to identify people. Specifically speaking, if the attacker is an institute providing data, they can use their own data’ SA value as a QID value. The assumption of l-diversity is violated, so the existing safety metric loses its effect on protecting data. In this paper, we propose a new anonymization method to conceal organizations’ important data by inserting dummy values, thereby enabling analysts to use the data safely. At the same time, we provide a calculating method to decrease the influence of the noise generated from the dummy insertion. We confirm these methods’ effectiveness by measuring accuracy in a data analysis experiments.
机译:当数据分析人员必须保证与其一起工作的数据分析的安全性时,防止个人身份识别很重要。为解决该问题而提出的方法需要改变数据值的一部分或将其删除。关于处理,各个数据的属性分为三组:标识符(ID),准标识符(QID)和敏感属性(SA)。 ID是直接标识个人的数据,例如姓名。 QID是可以通过组合个人来识别个人的属性,例如年龄和出生地。 SA是非常重要的信息,在将数据标识给个人时不应该公开。到目前为止,利用这些概念,提出了一种数据的安全度量,例如l分集。在l多样性下,我们使用一个假设,即任何人都不知道SA值,并且我们处理数据以防止攻击者识别。但是,在某些情况下,现有方法无法保护数据免受隐私侵害。在多个组织完成的分析中,他们整合了数据以进行有效的数据研究。尽管他们可以获得有利可图的结果,但集成数据可能包含攻击者用来识别人员的信息。具体地说,如果攻击者是提供数据的机构,则他们可以使用自己数据的SA值作为QID值。违反了l多样性的假设,因此现有的安全指标失去了对数据保护的作用。在本文中,我们提出了一种新的匿名化方法,即通过插入虚拟值来隐藏组织的重要数据,从而使分析人员能够安全地使用数据。同时,我们提供了一种计算方法,以减少虚拟插入所产生的噪声的影响。我们通过在数据分析实验中测量准确性来确认这些方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号