Sparse representations account for most or all of the information of a signal by a linear combination of a few elementary signals called atoms, and have increasinglybecome recognized as providing high performance for applications as diverse as noise reduction, compression, inpainting, compressive sensing, patternclassification, and blind source separation. In this dissertation, we learn the sparse representations of high-dimensional signals for various learning and vision tasks,including image classification, single image super-resolution, compressive sensing, and graph learning.Based on the bag-of-features (BoF) image representation in a spatial pyramid, we first transform each local image descriptor into a sparse representation, and then these sparse representations are summarized into a fixed-length feature vector over different spatial locations across different spatial scales by max pooling. Theproposed generic image feature representation properly handles the large in-class variance problem in image classification, and experiments on object recognition,scene classification, face recognition, gender recognition, and handwritten digit recognition all lead to state-of-the-art performances on the benchmark datasets.We cast the image super-resolution problem as one of recovering a high-resolution image patch for each low-resolution image patch based on recent sparse signal recovery theories, which state that, under mild conditions, a high-resolution signal can be recovered from its low-resolution version if the signal has a sparserepresentation in terms of some dictionary. We jointly learn the dictionaries for high- and low-resolution image patches and enforce them to have common sparserepresentations for better recovery. Furthermore, we employ image features and enforce patch overlapping constraints to improve prediction accuracy. Experimentsshow that the algorithm leads to surprisingly good results.Graph construction is critical for those graph-orientated algorithms designed for the purposes of data clustering, subspace learning, and semi-supervised learning. We model the graph construction problem, including neighbor selection andweight assignment, by finding the sparse representation of a data sample with respect to all other data samples. Since natural signals are high-dimensional signalsof a low intrinsic dimension, projecting a signal onto the nearest and lowest dimensional linear subspace is more likely to find its kindred neighbors, and therefore improves the graph quality by avoiding many spurious connections. The proposed graph is informative, sparse, robust to noise, and adaptive to the neighborhood selection; it exhibits exceptionally high performance in various graph-based applications.To this end, we propose a generic dictionary training algorithm that learns more meaningful sparse representations for the above tasks. The dictionary learningalgorithm is formulated as a bilevel optimization problem, which we prove can be solved using stochastic gradient descent. Applications of the generic dictionary training algorithm in supervised dictionary training for image classification, super-resolution, and compressive sensing demonstrate its effectiveness in sparse modeling of natural signals.
展开▼