Articulatory trajectories for large-vocabulary speech recognition

机译：大词汇语音识别的发音轨迹

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Studies have demonstrated that articulatory information can model speech variability effectively and can potentially help to improve speech recognition performance. Most of the studies involving articulatory information have focused on effectively estimating them from speech, and few studies have actually used such features for speech recognition. Speech recognition studies using articulatory information have been mostly confined to digit or medium vocabulary speech recognition, and efforts to incorporate them into large vocabulary systems have been limited. We present a neural network model to estimate articulatory trajectories from speech signals where the model was trained using synthetic speech signals generated by Haskins Laboratories' task-dynamic model of speech production. The trained model was applied to natural speech, and the estimated articulatory trajectories obtained from the models were used in conjunction with standard cepstral features to train acoustic models for large-vocabulary recognition systems. Two different large-vocabulary English datasets were used in the experiments reported here. Results indicate that employing articulatory information improves speech recognition performance not only under clean conditions but also under noisy background conditions. Perceptually motivated robust features were also explored in this study and the best performance was obtained when systems based on articulatory, standard cepstral and perceptually motivated feature were all combined.

机译：研究表明，发音信息可以有效地建模语音变异性，并可能有助于提高语音识别性能。涉及发音信息的大多数研究都集中于有效地从语音中估计语音信息，而很少有研究实际将此类功能用于语音识别。使用发音信息的语音识别研究主要限于数字或中级词汇的语音识别，并且将其纳入大型词汇系统的努力受到限制。我们提出了一种神经网络模型，用于从语音信号中估计发音轨迹，其中该模型是使用Haskins实验室的语音生产任务动态模型生成的合成语音信号进行训练的。将训练后的模型应用于自然语音，并将从模型中获得的估计发音轨迹与标准倒谱特征结合使用，以训练用于大词汇量识别系统的声学模型。在这里报告的实验中使用了两个不同的大词汇量英语数据集。结果表明，使用发音信息不仅在干净的条件下而且在嘈杂的背景条件下都可以改善语音识别性能。在这项研究中，还探索了具有知觉动机的鲁棒功能，并且当将基于发音，标准倒谱和知觉动机特征的系统全部组合在一起时，可以获得最佳性能。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2013年|7145-7149|共5页
会议地点
作者
Mitra Vikramjit; Wang Wen; Stolcke Andreas; Nam Hosung;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
articulatory trajectories; artificial neural networks; large vocabulary speech recognition; vocal tract variables;

机译：发音轨迹;人工神经网络;大词汇量语音识别;声道变量;

相似文献

外文文献
中文文献
专利

1. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion [J] . Ghosh P.K., Narayanan S. The Journal of the Acoustical Society of America . 2011,第4aPta1期

机译：使用从与主题无关的声音到发音反转的发音特征进行自动语音识别
2. Reducing latency for language identification based on large-vocabulary continuous speech recognition [J] . Takuma Okamoto, Atsuo Hiroe, Hisashi Kawai Acoustical science and technology . 2017,第1期

机译：减少基于大词汇量连续语音识别的语言识别延迟
3. A segmental framework for fully-unsupervised large-vocabulary speech recognition [J] . Kamper Herman, Jansen Aren, Goldwater Sharon Computer speech and language . 2017,第nova期

机译：完全无监督的大词汇语音识别的分段框架
4. ARTICULATORY TRAJECTORIES FOR LARGE-VOCABULARY SPEECH RECOGNITION [C] . Vikramjit Mitra, Wen Wang, Andreas Stolcke, International Conference on Acoustics, Speech and Signal Processing . 2013

机译：大型词汇识别的铰接轨迹
5. Balancing model resolution and generalizability in large-vocabulary continuous speech recognition. [D] . Luo, Xiaoqiang. 1999

机译：在大词汇量连续语音识别中平衡模型的分辨率和可推广性。
6. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion [O] . Prasanta Kumar Ghosh, Shrikanth Narayanan -1

机译：使用从独立于受试者的声学到发音反转的发音特征进行自动语音识别
7. Articulatory trajectories for large-vocabulary speech recognition [O] . Vikramjit Mitra, Wen Wang, Andreas Stolcke, 2013

机译：大词汇量语音识别的发音轨迹
8. Articulatory Trajectories for Large-Vocabulary Speech Recognition. [R] . A. Stolcke C. Richey H. Nam J. Yuan M. Liberman V. Mitra W. Wang 2013

机译：大词汇量语音识别的发音轨迹。

Articulatory trajectories for large-vocabulary speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅