Robust Word Recognition using articulatory trajectories and Gestures

机译：使用发音轨迹和手势进行可靠的单词识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Articulatory Phonology views speech as an ensemble of constricting events (e.g. narrowing lips, raising tongue tip), gestures, at distinct organs (lips, tongue tip, tongue body, velum, and glottis) along the vocal tract. This study shows that articulatory information in the form of gestures and their output trajectories (tract variable time functions or TVs) can help to improve the performance of automatic speech recognition systems. The lack of any natural speech database containing such articulatory information prompted us to use a synthetic speech dataset (obtained from Haskins Laboratories TAsk Dynamic model of speech production) that contains acoustic waveform for a given utterance and its corresponding gestures and TVs. First, we propose neural network based models to recognize the gestures and estimate the TVs from acoustic information. Second, the "synthetic-data trained" articulatory models were applied to the natural speech utterances in Aurora-2 corpus to estimate their gestures and TVs. Finally, we show that the estimated articulatory information helps to improve the noise robustness of a word recognition system when used along with the cepstral features.

机译：语音发音学将语音视为沿声道在不同器官（嘴唇，舌尖，舌体，舌状体，声门和声门）的紧缩事件（例如收紧嘴唇，抬高舌尖），手势的集合。这项研究表明，以手势及其输出轨迹（短时可变时间函数或电视）形式的发音信息可以帮助提高自动语音识别系统的性能。缺少任何包含此类发音信息的自然语音数据库，促使我们使用合成语音数据集（从Haskins Laboratories TAsk语音生成动态模型获得），该数据集包含给定发声的声波及其相应的手势和电视。首先，我们提出了基于神经网络的模型来识别手势并从声学信息中估计电视。其次，将“合成数据训练”的发音模型应用于Aurora-2语料库中的自然语音话语，以估计其手势和电视。最后，我们表明，与倒谱特征一起使用时，估计的发音信息有助于提高单词识别系统的噪声鲁棒性。

著录项

来源
《Annual conference of the International Speech Communication Association;INTERSPEECH 2010》|2011年|p.2038-2041|共4页
会议地点
作者
Vikramjit Mitra; Hosung Nam; Carol Espy-Wilson; Elliot Saltzman; Louis Goldstein;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信;
关键词
noise robust speech recognition; articulatory phonology; speech gestures; tract variables; tada model neural networks; speech inversion;

机译：噪声鲁棒的语音识别;发音语音学言语手势;道变量塔达模型神经网络语音倒置;

相似文献

外文文献
中文文献
专利

1. Recognizing articulatory gestures from speech for robust speech recognition [J] . Mitra V., Nam H., Espy-Wilson C., The Journal of the Acoustical Society of America . 2012,第3aPta1期

机译：识别语音中的发音手势以实现可靠的语音识别
2. Seeing the initial articulatory gestures of a word triggers lexical access [J] . Fort M., Kandel S., Chipot J., Language and cognitive processes . 2013,第8期

机译：看到单词的初始发音手势会触发词汇访问
3. Gesture-Radar: A Dual Doppler Radar Based System for Robust Recognition and Quantitative Profiling of Human Gestures [J] . Zhu Wang, Zhiwen Yu, Xinye Lou, Human-Machine Systems, IEEE Transactions on . 2021,第1期

机译：手势雷达：一种基于双多普勒雷达的鲁棒识别和人类手势的定量剖析系统
4. Robust Word Recognition using articulatory trajectories and Gestures [C] . Vikramjit Mitra, Hosung Nam, Carol Espy-Wilson, Annual conference of the International Speech Communication Association . 2010

机译：使用明晰度轨迹和手势的强大词识别
5. Robust Mobile Visual Recognition System: From Bag of Visual Words to Deep Learning [D] . Li, Dawei. 2017

机译：强大的移动视觉识别系统：从视觉单词袋到深度学习
6. Multi-target video-based face recognition and gesture recognition based on enhanced detection and multi-trajectory incremental learning [O] . Jirui Lin, Laiyuan Xiao, Tao Wu -1

机译：基于增强检测和多轨迹增量学习的基于视频的多目标人脸识别和手势识别
7. Robust speech recognition using articulatory gestures in a Dynamic Bayesian Network framework [O] . Vikramjit Mitra, Hosung Nam, Carol Y. Espy-wilson 2011

机译：在动态贝叶斯网络框架中使用发音手势进行稳健的语音识别
8. Recognizing Articulatory Gestures from Speech for Robust Speech Recognition. [R] . C. Espy-Wilson E. Saltzman H. Nam L. Goldstein V. Mitra 2012

机译：从语音识别衔接手势以获得强大的语音识别能力。

Robust Word Recognition using articulatory trajectories and Gestures

摘要

著录项

相似文献

相关主题

期刊订阅