We discuss how to build sparse one hidden layer MLP replacing the standard l_2 weight decay penalty on all weights by an l_1 penalty on the linear output weights. We will propose an iterative two step training procedure where the output weights are found using FISTA proximal optimization algorithm to solve a Lasso-like problem and the hidden weights are computed by unconstrained minimization. As we shall discuss, the procedure has a complexity equivalent to that of standard MLP training, yields MLPs with similar performance and, as a by product, automatically selects the number of hidden units.
展开▼