...
首页> 外文期刊>Engineering Applications of Artificial Intelligence >Toward sensitive document release with privacy guarantees
【24h】

Toward sensitive document release with privacy guarantees

机译:借助隐私保护实现敏感文档的发布

获取原文
获取原文并翻译 | 示例

摘要

Privacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection measures. Nowadays, much of these data are texts (e.g., emails, messages posted in social media, healthcare outcomes, etc.) that, because of their unstructured and semantic nature, constitute a challenge for automatic data protection methods. In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. To do so, human experts identify sensitive terms (i.e., terms that may reveal identities and/or confidential information) and protect them accordingly (e.g., via removal or, preferably, generalization). To relieve experts from this burdensome task, in a previous work we introduced the theoretical basis of C-sanitization, an inherently semantic privacy model that provides the basis to the development of automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations when applied to practice (mainly regarding flexibility, efficiency and accuracy). In this paper, we propose a new more flexible model, named (C, g(C))-sanitization, which enables an intuitive configuration of the trade-off between the desired level of protection (i.e., controlled information disclosure) and the preservation of the utility of the protected data (i.e., amount of semantics to be preserved). Moreover, we also present a set of technical solutions and algorithms that provide an efficient and scalable implementation of the model and improve its practical accuracy, as we also illustrate through empirical experiments.
机译:隐私已成为现代信息社会的严重关切。每天交换或发布给不受信任方的许多数据的敏感性质要求负责任的组织采取适当的隐私保护措施。如今,这些数据中的许多都是文本(例如,电子邮件,在社交媒体上发布的消息,医疗结果等),由于它们的结构和语义性质,它们构成了自动数据保护方法的挑战。实际上,文本文档通常在称为文档编辑或清理的过程中手动保护。为此,人类专家识别敏感术语(即,可能揭示身份和/或机密信息的术语)并相应地对其进行保护(例如,通过去除或优选地通过概括)。为了使专家摆脱繁重的工作,在先前的工作中,我们介绍了C-sanitization的理论基础,C-sanitization是一种固有的语义隐私模型,为自动文档修订/清理算法的开发提供了基础,并提供了明确的先验隐私保证。数据保护;尽管C消毒在实践中仍然具有一些潜在的局限性(主要是关于灵活性,效率和准确性)。在本文中,我们提出了一种新的更灵活的模型,称为(C,g(C))-消毒,该模型可以直观地配置所需保护级别(即受控信息公开)和保存之间的折衷受保护数据的实用性(即要保留的语义量)。此外,我们还提供了一组技术解决方案和算法,可提供有效且可扩展的模型实现并提高模型的实际准确性,正如我们还通过经验实验所说明的那样。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号