MULTI-STREAM ASYNCHRONY DYNAMIC BAYESIAN NETWORK MODEL FOR AUDIO-VISUAL CONTINUOUS SPEECH RECOGNITION

机译：用于视听连续语音识别的多流异步动态贝叶斯网络模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

How best to describe the asynchrony of the speech and lip motion is a key problem of audio-visual speech recognition model. A Multi-Stream Asynchrony Dynamic Bayesian Network (MS-ADBN) model is brought forward for audio-visual speech recognition, and in this model, audio stream and visual stream are synchronous in word node, while between the word nodes, each stream has its own independent phone, phone transition and observation vector node, and word transition probability is determined by audio stream and visual stream together. For each stream, each word is composed of its corresponding phones, and each phone is associated with observation feature (audio feature for audio stream and visual feature for visual stream), with some probability modeled by Gaussian mixed model. Compare with general multi-stream HMM, MS-ADBN model describes the asynchrony of audio stream and visual stream to the word level. The experiment results on continuous digit audio visual database show that: compare with multi-stream HMM, in the mismatch noise environment, an average improvement of 10.07% are obtained for MS-ADBN model.

机译：如何最好地描述语音和唇部运动的异步是视听语音识别模型的关键问题。为视听语音语音识别提出了一种多流异步动态贝叶斯网络（MS-ADBN）模型，在此模型中，音频流和可视流在Word节点中同步，而在字节点之间，每个流都有其自己的独立电话，电话转换和观察矢量节点，以及单词转换概率由音频流和视觉流一起确定。对于每个流，每个单词由其对应的电话组成，并且每个电话与观察特征（用于音频流和视觉流的视觉特征的音频特征）相关联，其由高斯混合模型建模的一些概率。与一般多流HMM比较，MS-ADBN模型描述了音频流和视觉流的Asynchrony到单词级别。实验结果在连续数字视觉数据库上显示：与多流肝脏相比，在不匹配噪声环境中，为MS-ADBN模型获得了10.07％的平均改善。

著录项

来源
《International Workshop on Systems, Signals, Image Processing》|2007年||共4页
会议地点
作者
Guoyun Lv; Dongmei Jiang; Rongchun Zhao; Xiaoyue Jiang; H. Sahli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN911.7-53;
关键词
Dynamic Bayesian Networks; Bayesian Tangent Shape Model; Audio-visual; Speech recognition;

机译：动态贝叶斯网络;贝叶斯切线形状模型;视听;语音识别;

相似文献

外文文献
中文文献
专利

1. Dynamic Bayesian Networks for Audio-Visual Speech Recognition [J] . Ara V. Nefian, Luhong Liang, Xiaobo Pi, EURASIP journal on advances in signal processing . 2002,第11期

机译：动态贝叶斯网络用于视听语音识别
2. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction [J] . Yue Zhao, Hui Wang, Qiang Ji International Journal of Advanced Robotic Systems . 2012,第6期

机译：基于深度动态贝叶斯网络的自然人机交互视听藏语语音识别
3. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction Regular Paper [J] . Zhao Yue, Wang Hui, Ji Qiang International Journal of Advanced Robotic Systems . 2012,第期

机译：基于深度动态贝叶斯网络的自然人机器人相互作用普通纸张视听藏语语音识别
4. MULTI-STREAM ASYNCHRONY DYNAMIC BAYESIAN NETWORK MODEL FOR AUDIO-VISUAL CONTINUOUS SPEECH RECOGNITION [C] . Guoyun Lv, Dongmei Jiang, Rongchun Zhao, International Workshop on Systems, Signals, Image Processing . 2007

机译：用于视听连续语音识别的多流异步动态贝叶斯网络模型
5. Audio-Visual Asynchrony Modeling and Analysis for Speech Alignment and Recognition. [D] . Terry, Louis. 2011

机译：语音对齐和识别的视听异步建模和分析。
6. From Birdsong to Human Speech Recognition: Bayesian Inference on a Hierarchy of Nonlinear Dynamical Systems [O] . Izzet B. Yildiz, Katharina von Kriegstein, Stefan J. Kiebel 2003

机译：从Birdsong到人类语音识别：非线性动力学系统层次结构的贝叶斯推断
7. Audio-visual automatic speech recognition using Dynamic Bayesian Networks [O] . Reikeras Helge 2011

机译：基于动态贝叶斯网络的视听自动语音识别

MULTI-STREAM ASYNCHRONY DYNAMIC BAYESIAN NETWORK MODEL FOR AUDIO-VISUAL CONTINUOUS SPEECH RECOGNITION

摘要

著录项

相似文献

相关主题

期刊订阅