首页> 外文期刊>Journal of Statistical Software >Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate
【24h】

Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate

机译:具有结构罚则和稀疏罚则的线性回归的模拟数据:pylearn-simulate介绍

获取原文
           

摘要

A currently very active field of research is how to incorporate structure and prior knowledge in machine learning methods. It has lead to numerous developments in the field of non-smooth convex minimization. With recently developed methods it is possible to perform an analysis in which the computed model can be linked to a given structure of the data and simultaneously do variable selection to find a few important features in the data. However, there is still no way to unambiguously simulate data to test proposed algorithms, since the exact solutions to such problems are unknown. The main aim of this paper is to present a theoretical framework for generating simulated data. These simulated data are appropriate when comparing optimization algorithms in the context of linear regression problems with sparse and structured penalties. Additionally, this approach allows the user to control the signal-to-noise ratio, the correlation structure of the data and the optimization problem to which they are the solution. The traditional approach is to simulate random data without taking into account the actual model that will be fit to the data. But when using such an approach it is not possible to know the exact solution of the underlying optimization problem. With our contribution, it is possible to know the exact theoretical solution of a penalized linear regression problem, and it is thus possible to compare algorithms without the need to use, e.g., cross-validation. We also present our implementation, the Python package pylearn-simulate, available at https://github.comeurospin/pylearn-simulate and released under the BSD 3clause license. We describe the package and give examples at the end of the paper.
机译:当前非常活跃的研究领域是如何将结构和先验知识整合到机器学习方法中。它导致了非光滑凸最小化领域的众多发展。使用最近开发的方法,可以执行分析,其中可以将计算的模型链接到数据的给定结构,并同时进行变量选择以在数据中找到一些重要特征。但是,仍然没有办法明确地模拟数据来测试所提出的算法,因为未知此类问题的确切解决方案。本文的主要目的是提供一个用于生成模拟数据的理论框架。当在线性回归问题与稀疏和结构罚分的情况下比较优化算法时,这些模拟数据是合适的。另外,这种方法允许用户控制信噪比,数据的相关结构以及它们所要解决的优化问题。传统方法是模拟随机数据,而不考虑适合该数据的实际模型。但是,当使用这种方法时,不可能知道底层优化问题的确切解决方案。利用我们的贡献,有可能知道惩罚线性回归问题的确切理论解,因此可以比较算法而无需使用例如交叉验证。我们还介绍了我们的实现,即Python包pylearn-simulate,可从https://github.comeurospin/pylearn-simulate获得,并已获得BSD 3clause许可。我们在本文末尾描述了该软件包并给出了示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号