Hierarchical Multi Layer Perceptron (MLP) based long-term feature extraction is optimized for TANDEM connectionist large vocabulary continuous speech recognition (LVCSR) system within the QUAERO project. Training the bottleneck MLP on multi-resolutional RASTA filtered critical band energies, more than 20% relative word error rate (WER) reduction over standard MFCC system is observed after optimizing the number of target labels. Furthermore, introducing a deeper structure in the hierarchical bottleneck processing the relative gain increases to 25%. The final system based on deep bottleneck TANDEM features clearly outperforms the hybrid approach, even if the long-term features are also presented to the deep MLP acoustic model. The results are also verified on evaluation data of the year 2012, and about 20% relative WER improvement over classical cepstral system is measured even after speaker adaptive training.
展开▼