首页> 外文会议>Privacy in statistical databases >Using Support Vector Machines for Generating Synthetic Datasets
【24h】

Using Support Vector Machines for Generating Synthetic Datasets

机译:使用支持向量机生成综合数据集

获取原文
获取原文并翻译 | 示例

摘要

Generating synthetic datasets is an innovative approach for data dissemination. Values at risk of disclosure or even the entire dataset are replaced with multiple draws from statistical models. The quality of the released data strongly depends on the ability of these models to capture important relationships found in the original data. Defining useful models for complex survey data can be difficult and cumbersome. One possible approach to reduce the modeling burden for data disseminating agencies is to rely on machine learning tools to reveal important relationships in the data. This paper contains an initial investigation to evaluate whether support vector machines could be utilized to develop synthetic datasets. The application is limited to categorical data but extensions for continuous data should be straight forward. I briefly describe the concept of support vector machines and necessary adjustments for synthetic data generation. I evaluate the performance of the suggested algorithm using a real dataset, the IAB Establishment Panel. The results indicate that some data utility improvements might be achievable using support vector machines. However, these improvements come at the price of an increased disclosure risk compared to standard parametric modeling and more research is needed to find ways for reducing the risk. Some ideas for achieving this goal are provided in the discussion at the end of the paper.
机译:生成合成数据集是一种创新的数据分发方法。有风险披露的值甚至整个数据集都被统计模型的多次抽取所取代。发布数据的质量在很大程度上取决于这些模型捕获原始数据中重要关系的能力。为复杂的调查数据定义有用的模型可能既困难又麻烦。减轻数据分发机构的建模负担的一种可能方法是依靠机器学习工具来揭示数据中的重要关系。本文包含一项初步调查,以评估是否可以利用支持向量机来开发合成数据集。该应用程序仅限于分类数据,但是连续数据的扩展应该很简单。我简要描述了支持向量机的概念以及用于合成数据生成的必要调整。我使用IAB建立小组的真实数据集评估了建议算法的性能。结果表明,使用支持向量机可以实现某些数据实用程序的改进。但是,与标准参数建模相比,这些改进是以增加披露风险为代价的,需要进行更多研究以找到降低风险的方法。本文末尾的讨论中提供了一些实现此目标的想法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号