Ridge regression is one of the most popular and effective regularized regression methods, and one case of particular interest is that the number of features p is much larger than the number of samples n, i.e. p (>>) n. In this case, the standard optimization algorithm for ridge regression computes the optimal solution x~* in O(n~2p + n~3) time. In this paper, we propose a fast relative-error approximation algorithm for ridge regression. More specifically, our algorithm outputs a solution x satisfying ||x - x~*||_2 ≤ ε||x~*||_2 with high probability and runs in O(nnz( A) + n~3/ε~2) time, where nnz(A) is the number of non-zero entries of matrix A. To the best of our knowledge, this is the first algorithm for ridge regression that runs in o(n~2p) time with provable relative-error approximation bound on the output vector. In addition, we analyze the risk inflation bound of our algorithm and apply our techniques to two generalizations of ridge regression, including multiple response ridge regression and a non-linear ridge regression problem. Finally, we show empirical results on both synthetic and real datasets.
展开▼