首页> 外文会议>International Conference on Signal Processing and Communication Systems >Some Experiments on Context Mismatched Speech Recognition

Some Experiments on Context Mismatched Speech Recognition




An automatic speech recognition (ASR) system is required to normalize a number of intra- and inter-speaker variability as well as session, channel and ambiance differences in order to be effective. Some of the variability factors are gender, age, accent, emotion, speaking rate, etc., of the speakers. To address these sources of variability, speech data from a large number of speakers catering to varied conditions is pooled together for training the context-dependent triphone models. Furthermore, several feature-space normalization and speaker-space adaptation techniques are also incorporated into the system development. Another important factor of mismatch is frequency of occurrence of triphone contexts in the training and test data. In the case hidden Markov modeling, regression-tree-based state tying is performed to model the seen contexts and to deal with unseen ones. In those cases where the trained triphones occur less frequently (or are absent) in the test data, the recognition performance gets degraded. In this paper, we present our efforts to improve the performance of such context mismatched ASR tasks. In this regard, we explore the role of varying the number of senones on the recognition performance. It is hypothesized that, using lower number of senones is beneficial in such cases.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号