Traditional automatic speech recognition (ASR) systems usuallyget a sharp performance drop when noise presents inspeech. To make a robust ASR, we introduce a new model usingthe multi-task learning deep neural networks (MTL-DNN)to solve the speech denoising task in feature level. In this model,the networks are initialized by pre-training restricted Boltzmannmachines (RBM) and fine-tuned by jointly learning multipleinteractive tasks using a shared representation. In multi-tasklearning, we choose a noisy-clean speech pair fitting task as theprimary task and separately explore two constraints as the secondarytasks: phone label and phone cluster. In experiments,the denoised speech is reconstructed by the MTL-DNN usingthe noisy speech as input and it is respectively evaluated by theDNN-hidden Markov model (HMM) based and the GaussianMixture Model (GMM)-HMM based ASR systems. Resultsshow that, using the denoised speech, the word error rate (WER)is respectively reduced by 53.14% and 34.84% compared withbaselines. The MTL-DNN model also outperforms the generalsingle-task learning deep neural networks (STL-DNN) modelwith a performance improvement of 4.93% and 3.88% respectively.
展开▼