首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Articulatory trajectories for large-vocabulary speech recognition
【24h】

Articulatory trajectories for large-vocabulary speech recognition

机译:大词汇语音识别的发音轨迹

获取原文

摘要

Studies have demonstrated that articulatory information can model speech variability effectively and can potentially help to improve speech recognition performance. Most of the studies involving articulatory information have focused on effectively estimating them from speech, and few studies have actually used such features for speech recognition. Speech recognition studies using articulatory information have been mostly confined to digit or medium vocabulary speech recognition, and efforts to incorporate them into large vocabulary systems have been limited. We present a neural network model to estimate articulatory trajectories from speech signals where the model was trained using synthetic speech signals generated by Haskins Laboratories' task-dynamic model of speech production. The trained model was applied to natural speech, and the estimated articulatory trajectories obtained from the models were used in conjunction with standard cepstral features to train acoustic models for large-vocabulary recognition systems. Two different large-vocabulary English datasets were used in the experiments reported here. Results indicate that employing articulatory information improves speech recognition performance not only under clean conditions but also under noisy background conditions. Perceptually motivated robust features were also explored in this study and the best performance was obtained when systems based on articulatory, standard cepstral and perceptually motivated feature were all combined.
机译:研究表明,发音信息可以有效地建模语音变异性,并可能有助于提高语音识别性能。涉及发音信息的大多数研究都集中于有效地从语音中估计语音信息,而很少有研究实际将此类功能用于语音识别。使用发音信息的语音识别研究主要限于数字或中级词汇的语音识别,并且将其纳入大型词汇系统的努力受到限制。我们提出了一种神经网络模型,用于从语音信号中估计发音轨迹,其中该模型是使用Haskins实验室的语音生产任务动态模型生成的合成语音信号进行训练的。将训练后的模型应用于自然语音,并将从模型中获得的估计发音轨迹与标准倒谱特征结合使用,以训练用于大词汇量识别系统的声学模型。在这里报告的实验中使用了两个不同的大词汇量英语数据集。结果表明,使用发音信息不仅在干净的条件下而且在嘈杂的背景条件下都可以改善语音识别性能。在这项研究中,还探索了具有知觉动机的鲁棒功能,并且当将基于发音,标准倒谱和知觉动机特征的系统全部组合在一起时,可以获得最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号