Different normalization methods are applied in recent Large Vocabulary Continuous Speech Recognition Systems (LVCSR) to reduce the influence of speaker variability on the acoustic models. In this paper we investigate the use of Vocal Tract Length Normalization (VTLN) and Speaker Adaptive Training (SAT) in Multi Layer Perceptron (MLP) feature extraction on an English task. We achieve significant improvements by each normalization method and we gain further by stacking the normalizations. Studying features transformed by Constrained Maximum Likelihood Linear Regression (CMLLR) based SAT as possible input for MLP, further experiments show that MLP could not consistently take advantage of SAT as it does in case of VTLN.
展开▼