首页> 外文会议>International conference on healthcare science and engineering >An Improved Data Anonymization Algorithm for Incomplete Medical Dataset Publishing
【24h】

An Improved Data Anonymization Algorithm for Incomplete Medical Dataset Publishing

机译:用于不完整医学数据集发布的改进的数据匿名化算法

获取原文

摘要

To protect sensitive information of patients and prevent privacy leakage, it is necessary to deal with data anonymously in medical dataset publishing. Most of the existing anonymity protection technologies discard the records with missing data, and it will cause large differences in characteristics in data anonymization, resulting in severe information loss. To solve this problem, we propose a novel data anonymization algorithm for incomplete medical dataset based on L-diversity algorithm (DAIMDL) in this work. In the premise of preserving records with missing data, DAIMDL clusters data on the basis of the improved k-member algorithm, and uses the information entropy generated by data generalization to calculate the distance in clustering stage. Then, the data groups obtained by clustering are generalized. The experimental results show that it can protect the sensitive attributes of patients better, reduce the information loss during the anonymization process of missing data, and improve the availability of the dataset.
机译:为了保护患者的敏感信息并防止隐私泄露,有必要在医疗数据集发布中匿名处理数据。现有的大多数匿名保护技术都会丢弃缺少数据的记录,这将导致数据匿名化的特性差异很大,从而导致严重的信息丢失。为了解决这个问题,本文提出了一种基于L-多样性算法(DAIMDL)的不完整医学数据集数据匿名化算法。在保留丢失数据的记录的前提下,DAIMDL在改进的k成员算法的基础上对数据进行聚类,并使用数据概括生成的信息熵在聚类阶段计算距离。然后,归纳通过聚类获得的数据组。实验结果表明,它可以更好地保护患者的敏感属性,减少丢失数据匿名化过程中的信息丢失,提高数据集的可用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号