首页> 外文期刊>Data & Knowledge Engineering >An integrated framework for de-identifying unstructured medical data
【24h】

An integrated framework for de-identifying unstructured medical data

机译:用于取消识别非结构化医疗数据的集成框架

获取原文
获取原文并翻译 | 示例

摘要

While there is an increasing need to share medical information for public health research, such data sharing must preserve patient privacy without disclosing any information that can be used to identify a patient. A considerable amount of research in data privacy community has been devoted to formalizing the notion of identifiability and developing techniques for anonymization but are focused exclusively on structured data. On the other hand, efforts on de-identifying medical text documents in medical informatics community rely on simple identifier removal or grouping techniques without taking advantage of the research developments in the data privacy community. This paper attempts to fill the above gaps and presents a framework and prototype system for de-identifying health information including both structured and unstructured data. We empirically study a simple Bayesian classifier, a Bayesian classifier with a sampling based technique, and a conditional random field based classifier for extracting identifying attributes from unstructured data. We deploy a k-anonymization based technique for de-identifying the extracted data to preserve maximum data utility. We present a set of preliminary evaluations showing the effectiveness of our approach.
机译:尽管越来越需要共享医疗信息以进行公共卫生研究,但是这种数据共享必须保护患者的隐私而不泄露任何可用于识别患者的信息。数据隐私社区中的大量研究致力于形式化可识别性的概念,并开发了匿名化技术,但专门针对结构化数据。另一方面,在医学信息学界中对医学文本文档进行去识别的努力依赖于简单的标识符去除或分组技术,而没有利用数据隐私界的研究发展。本文试图填补上述空白,并提出了一种框架和原型系统,用于去识别包括结构化和非结构化数据在内的健康信息。我们根据经验研究简单的贝叶斯分类器,具有基于采样技术的贝叶斯分类器以及用于从非结构化数据中提取识别属性的基于条件随机场的分类器。我们部署了一种基于k匿名化的技术,以对所提取的数据进行去识别,以保留最大的数据实用性。我们提出了一组初步评估,表明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号