...
首页> 外文期刊>Theoretical and Experimental Plant Physiology >Scalable Iterative Classification for Sanitizing Large-Scale Datasets
【24h】

Scalable Iterative Classification for Sanitizing Large-Scale Datasets

机译:用于消毒大规模数据集的可扩展迭代分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Cheap ubiquitous computing enables the collection of massive amounts of personal data in a wide variety of domains. Many organizations aim to share such data while obscuring features that could disclose personally identifiable information. Much of this data exhibits weak structure (e.g., text), such that machine learning approaches have been developed to detect and remove identifiers from it. While learning is never perfect, and relying on such approaches to sanitize data can leak sensitive information, a small risk is often acceptable. Our goal is to balance the value of published data and the risk of an adversary discovering leaked identifiers. We model data sanitization as a game between 1) a publisher who chooses a set of classifiers to apply to data and publishes only instances predicted as non-sensitive and 2) an attacker who combines machine learning and manual inspection to uncover leaked identifying information. We introduce a fast iterative greedy algorithm for the publisher that ensures a low utility for a resource-limited adversary. Moreover, using five text data sets we illustrate that our algorithm leaves virtually no automatically identifiable sensitive instances for a state-of-the-art learning algorithm, while sharing over 93 percent of the original data, and completes after at most five iterations.
机译:廉价的无处不在的计算使得能够在各种域中收集大量的个人数据。许多组织旨在分享这些数据,同时模糊可能披露个人可识别信息的功能。这些数据的大部分都表现出薄弱的结构(例如,文本),使得已经开发了机器学习方法来检测和删除它的标识符。虽然学习永远不会完美,但依靠这些消毒数据的方法可以泄漏敏感信息,较小的风险通常是可接受的。我们的目标是平衡公布数据的价值以及对抗发现泄露的标识符的风险。我们将数据消毒模式为1)介于1)的一个游戏,该出版商选择一组分类器来应用于数据并仅发布预测为非敏感的实例和2)攻击者,该攻击者将机器学习和手动检查结合以揭示泄露泄漏的识别信息。我们为发布者介绍了一种快速迭代的贪婪算法,可确保资源有限的对手的低实用程序。此外,使用五个文本数据集,我们说明了我们的算法几乎没有用于最先进的学习算法的自动可识别的敏感实例,同时共享超过93%的原始数据,并在最多五个迭代之后完成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号