Resampling-Based Variable Selection with Lasso for p n and Partially Linear Models

机译：p n和部分线性模型的基于套索的基于重采样的变量选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The linear model of the regression function is a widely used and perhaps, in most cases, highly unrealistic simplifying assumption, when proposing consistent variable selection methods for large and highly-dimensional datasets. In this paper, we study what happens from theoretical point of view, when a variable selection method assumes a linear regression function and the underlying ground-truth model is composed of a linear and a non-linear term, that is at most partially linear. We demonstrate consistency of the Lasso method when the model is partially linear. However, we note that the algorithm tends to increase even more the number of selected false positives on partially linear models when given few training samples. That is usually because the values of small groups of samples happen to explain variation coming from the non-linear part of the response function and the noise, using a linear combination of wrong predictors. We demonstrate theoretically that false positives are likely to be selected by the Lasso method due to a small proportion of samples, which happen to explain some variation in the response variable. We show that this property implies that if we run the Lasso on several slightly smaller size data replications, sampled without replacement, and intersect the results, we are likely to reduce the number of false positives without losing already selected true positives. We propose a novel consistent variable selection algorithm based on this property and we show it can outperform other variable selection methods on synthetic datasets of linear and partially linear models and datasets from the UCI machine learning repository.

机译：当为大型和高维数据集提出一致的变量选择方法时，回归函数的线性模型是一种广泛使用的方法，在大多数情况下，可能是非常不现实的简化假设。在本文中，我们将从理论的角度研究发生的情况，当变量选择方法采用线性回归函数并且基础的真实模型由线性项和非线性项组成时，线性项和非线性项最多是部分线性的。当模型为部分线性时，我们证明了套索方法的一致性。但是，我们注意到在给定训练样本很少的情况下，该算法往往会增加部分线性模型上选择的误报的数量。这通常是因为使用错误的预测变量的线性组合，一小组样本的值恰好可以解释来自响应函数的非线性部分和噪声的变化。我们从理论上证明，由于样本比例很小，通过套索方法可能会选择假阳性，这恰好解释了响应变量的某些变化。我们表明，此属性表示，如果对几个略小一些的数据复制运行Lasso，不进行替换就进行采样，然后与结果相交，则很可能会减少误报的数量，而不会丢失已经选择的真正的正数。我们基于此属性提出了一种新颖的一致变量选择算法，并证明了它在线性和部分线性模型的综合数据集以及UCI机器学习存储库中的数据集上优于其他变量选择方法。

著录项

来源
《IEEE International Conference on Machine Learning and Applications》|2015年|1076-1082|共7页
会议地点
作者
Mihaela Andreea Mares; Yike Guo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
big data; feature selection; high-dimensional data; lasso regression; non-linearity; variable selection;

机译：大数据;特征选择;高维数据;套索回归;非线性;变量选择;

相似文献

外文文献
中文文献
专利

1. WLAD-LASSO method for robust estimation and variable selection in partially linear models [J] . Yang Hu, Li Ning Communications in Statistics . 2018,第19a21期

机译：WLAD-LASSO方法用于部分线性模型中的鲁棒估计和变量选择
2. Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics [J] . Epprecht Camila, Guegan Dominique, Veiga Alvaro, Communications in Statistics . 2021,第1a2期

机译：通过线性模型自动化方法的可变选择和预测：套索/ adalasso和autoetrics
3. Bayesian adaptive lasso with variational Bayes for variable selection in high-dimensional generalized linear mixed models [J] . Dao Thanh Tung, Minh-Ngoc Tran, Tran Manh Cuong Communications in Statistics . 2019,第1a2期

机译：高维广义线性混合模型中带有变数贝叶斯的贝叶斯自适应套索用于变量选择
4. Resampling-Based Variable Selection with Lasso for p n and Partially Linear Models [C] . Mihaela Andreea Mares, Yike Guo IEEE International Conference on Machine Learning and Applications . 2015

机译：基于重采样的变量选择，用于P N和部分线性模型
5. Grouped variable selection in high dimensional partially linear additive cox model. [D] . Liu, Li. 2010

机译：高维部分线性加性Cox模型中的分组变量选择。
6. Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method [O] . Yuan Jiang, Yunxiao He, Heping Zhang -1

机译：通过先验LASSO方法对广义线性模型进行先验信息变量选择
7. Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics [O] . Camila Epprecht, Dominique Guégan, Álvaro Veiga, 2019

机译：通过线性模型自动化方法的可变选择和预测：套索/ adalasso和autoetrics

Resampling-Based Variable Selection with Lasso for p n and Partially Linear Models

摘要

著录项

相似文献

相关主题

期刊订阅