首页> 外文会议>International Conference on Service Operations and Logistics, and Informatics >Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes
【24h】

Anonymization and Analysis of Horizontally and Vertically Divided User Profile Databases with Multiple Sensitive Attributes

机译:具有多个敏感属性的水平和垂直划分用户配置文件数据库的匿名化和分析

获取原文

摘要

Preventing the identification of individuals is important when data analyzers have to guarantee the safety of the data analysis they work with. A method proposed to solve this problem entails altering a part of the data value or deleting it. As to the processes, attributes of the individual data are divided into three groups: identifier (ID), quasi-identifier (QID), and sensitive attribute (SA). ID is the data that identify an individual directly, such as name. QID is the attributes that could identify an individual by combining them, such as age and birthplace. SA is very important information and should not be exposed when the data is identified to an individual. Utilizing these concepts, a safety metric for the data, such as l-diversity, is proposed so far. Under l-diversity, we use the assumption that the SA value is not known for anyone, and we process the data to prevent attackers from identifying. However, there are scenarios in which existing methods cannot protect the data against an invasion of privacy. In an analysis completed by multiple organizations, they integrated their data to carry out the effective data research. Although they can obtain profitable results, the integrated data could include information that attackers use to identify people. Specifically speaking, if the attacker is an institute providing data, they can use their own data’ SA value as a QID value. The assumption of l-diversity is violated, so the existing safety metric loses its effect on protecting data. In this paper, we propose a new anonymization method to conceal organizations’ important data by inserting dummy values, thereby enabling analysts to use the data safely. At the same time, we provide a calculating method to decrease the influence of the noise generated from the dummy insertion. We confirm these methods’ effectiveness by measuring accuracy in a data analysis experiments.
机译:当数据分析仪必须保证他们与之合作的数据分析的安全性时,防止个人的识别很重要。提出解决此问题的方法需要更改数据值的一部分或删除它。对于过程,各个数据的属性分为三组:标识符(ID),准标识符(QID)和敏感属性(SA)。 ID是直接标识个体的数据,例如名称。 Qid是通过将它们组合的属性来识别个人,例如年龄和发源地。 SA是非常重要的信息,当数据识别到个人时不应暴露。到目前为止,提出了利用这些概念,为L-多样性等数据的安全度量。在L-多样性下,我们使用假设SA值对于任何人来说都不知道,我们处理数据以防止攻击者识别。但是,有方案在其中现有方法无法保护数据免受隐私的入侵。在由多个组织完成的分析中,他们将其数据整合以执行有效的数据研究。虽然他们可以获得有利可图的结果,但综合数据可能包括攻击者用于识别人员的信息。具体说话,如果攻击者是提供数据的研究所,则可以使用自己的数据的SA值作为QID值。违反了L-多样性的假设,因此现有的安全度量对保护数据的影响失去了它。在本文中,我们提出了一种新的匿名化方法来通过插入虚拟值来隐藏组织的重要数据,从而使分析师能够安全地使用数据。同时,我们提供了一种计算方法来减少从虚设插入产生的噪声的影响。我们通过测量数据分析实验中的准确性来确认这些方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号