首页> 美国政府科技报告 >Vocal Tract Length Normalization for Large Vocabulary Continuous SpeechRecognition
【24h】

Vocal Tract Length Normalization for Large Vocabulary Continuous SpeechRecognition

机译:大词汇量连续语音识别的声带长度归一化

获取原文

摘要

Generally speaking, the speaker-dependence of a speech recognition system stemsfrom speaker-dependent speech feature. The variation of vocal tract length and/or shape is one of the major source of inter-speaker variations. In this paper, we address several methods of vocal tract length normalization (VTLN) for large vocabulary continuous speech recognition: (1) explore the bilinear warping VTLN in frequency domain; (2) propose a speaker-specific Bark/Mel scale VTLN in Bark/Mel domain; (3) investigate adaptation of the normalization factor. Our experimental results show that the speaker-specific Bark/Mel scale VTLN is better than the piecewise/bilinear warping VTLN in frequency domain. It can reduce up to 12% word error rate for our Spanish and English spontaneous speech scheduling task database. For adaptation of the normalization factor, our experimental results show that promising result can be obtained by using not more than three utterances from a new speaker to estimate his/her normalization factor, and the unsupervised adaptation mode works as well as the supervised one. Therefore, the computational complexity of VTLN can be avoided by learning the normalization factor from very few utterances of a new speaker.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号