首页> 外文会议>International Conference on Privacy in Statistical Databases >Using Support Vector Machines for Generating Synthetic Datasets
【24h】

Using Support Vector Machines for Generating Synthetic Datasets

机译:使用支持向量机生成合成数据集

获取原文

摘要

Generating synthetic datasets is an innovative approach for data dissemination. Values at risk of disclosure or even the entire dataset are replaced with multiple draws from statistical models. The quality of the released data strongly depends on the ability of these models to capture important relationships found in the original data. Defining useful models for complex survey data can be difficult and cumbersome, One possible approach to reduce the modeling burden for data disseminating agencies is to rely on machine learning tools to reveal important relationships in the data. This paper contains an initial investigation to evaluate whether support vector machines could be utilized to develop synthetic datasets. The application is limited to categorical data but extensions for continuous data should be straight forward. I briefly describe the concept of support vector machines and necessary adjustments for synthetic data generation. I evaluate the performance of the suggested algorithm using a real dataset, the IAB Establishment Panel. The results indicate that some data utility improvements might be achievable using support vector machines. However, these improvements come at the price of an increased disclosure risk compared to standard parametric modeling and more research is needed to find ways for reducing the risk. Some ideas for achieving this goal are provided in the discussion at the end of the paper.
机译:生成合成数据集是一种创新的数据传播方法。披露风险或甚至整个数据集的价值被统计模型的多个绘制替换。发布数据的质量强烈取决于这些模型捕获原始数据中发现的重要关系的能力。定义复杂调查数据的有用模型可能是困难和繁琐的,减少数据传播代理商的建模负担的一种可能方法是依赖机器学习工具来揭示数据中的重要关系。本文包含初步调查,以评估是否可以使用支持向量机来开发合成数据集。该应用程序仅限于分类数据,但连续数据的扩展应该是直的。我简要描述了支持向量机的概念以及合成数据生成的必要调整。我评估了使用真实数据集,IAB建立面板的建议算法的性能。结果表明,可以使用支持向量机可以实现一些数据实用程序的改进。然而,与标准参数化建模相比,这些改进的价格增加了披露风险,并且需要更多的研究来寻找降低风险的方法。在论文末尾的讨论中提供了实现这一目标的一些想法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号