A key step in many statistical learning methods used in machine learning involves solving a convex optimization problem containing one or more hyper-parameters that must be selected by the users. While cross validation is a commonly employed and widely accepted method for selecting these parameters, its implementation by a grid-search procedure in the parameter space effectively limits the desirable number of hyper-parameters in a model, due to the combinatorial explosion of grid points in high dimensions. A novel paradigm based on bilevel optimization approach is proposed and gives rise to a unifying framework within which issues such as model selection can be addressed.;The machine learning problem is formulated as a bilevel program---a mathematical program that has constraints which are functions of optimal solutions of another mathematical program called the inner-level program. The bilevel program is transformed to an equivalent mathematical program with equilibrium constraints (MPEC). Two alternative bilevel optimization algorithms are developed to optimize the MPEC and provide a systematic search of the hyper-parameters.;In the first approach, the equilibrium constraints of the MPEC are relaxed to form a nonlinear program with linear objective and non-convex quadratic inequality constraints, which is then solved using a general purpose nonlinear programming solver. In the second approach, the equilibrium constraints are treated as penalty terms in the objective, and the resulting non-convex quadratic program with linear constraints is solved using a successive linearization algorithm.;The flexibility of the bilevel approach to deal with multiple hyper-parameters, makes it powerful approach to problems such as parameter and feature selection (model selection). In this thesis, three problems are studied: model selection for support vector (SV) classification, model selection for SV regression and missing value-imputation for SV regression. Extensive computational results establish that both algorithmic approaches find solutions that generalize as well or better than conventional approaches and are much more computationally efficient.
展开▼