Learning representations from massive unlabeled data is a hot topic for high-level tasks in many applications. The recent great improvements on benchmark data sets, which are achieved by increasingly complex unsupervised learning methods and deep learning models with lots of parameters, usually require many tedious tricks and much expertise to tune. However, filters learned by these complex architectures are quite similar to standard hand-crafted features visually, and training the deep models costs quite long time to fine-tune their weights. In this paper, Extreme Learning Machine-Autoencoder (ELM-AE) is employed as the learning unit to learn local receptive fields at each layer, and the lower layer responses are transferred to the last layer (trans-layer) to form a more complete representation to retain more information. In addition, some beneficial methods in deep learning architectures such as local contrast normalization and whitening are added to the proposed hierarchical Extreme Learning Machine networks to further boost the performance. The obtained trans-layer representations are followed by block histograms with binary hashing to learn translation and rotation invariant representations, which are utilized to do high-level tasks such as recognition and detection. Compared to traditional deep learning methods, the proposed trans-layer representation method with ELM-AE based learning of local receptive filters has much faster learning speed and is validated in several typical experiments, such as digit recognition on MNIST and MNIST variations, object recognition on Caltech 101. State-of-the-art performances are achieved on both Caltech 101 15 samples per class task and 4 of 6 MNIST variations data sets, and highly impressive results are obtained on MNIST data set and other tasks.
展开▼