首页> 外文会议>International Workshop on Systems, Signals, Image Processing >MULTI-STREAM ASYNCHRONY DYNAMIC BAYESIAN NETWORK MODEL FOR AUDIO-VISUAL CONTINUOUS SPEECH RECOGNITION
【24h】

MULTI-STREAM ASYNCHRONY DYNAMIC BAYESIAN NETWORK MODEL FOR AUDIO-VISUAL CONTINUOUS SPEECH RECOGNITION

机译:用于视听连续语音识别的多流异步动态贝叶斯网络模型

获取原文

摘要

How best to describe the asynchrony of the speech and lip motion is a key problem of audio-visual speech recognition model. A Multi-Stream Asynchrony Dynamic Bayesian Network (MS-ADBN) model is brought forward for audio-visual speech recognition, and in this model, audio stream and visual stream are synchronous in word node, while between the word nodes, each stream has its own independent phone, phone transition and observation vector node, and word transition probability is determined by audio stream and visual stream together. For each stream, each word is composed of its corresponding phones, and each phone is associated with observation feature (audio feature for audio stream and visual feature for visual stream), with some probability modeled by Gaussian mixed model. Compare with general multi-stream HMM, MS-ADBN model describes the asynchrony of audio stream and visual stream to the word level. The experiment results on continuous digit audio visual database show that: compare with multi-stream HMM, in the mismatch noise environment, an average improvement of 10.07% are obtained for MS-ADBN model.
机译:如何最好地描述语音和唇部运动的异步是视听语音识别模型的关键问题。为视听语音语音识别提出了一种多流异步动态贝叶斯网络(MS-ADBN)模型,在此模型中,音频流和可视流在Word节点中同步,而在字节点之间,每个流都有其自己的独立电话,电话转换和观察矢量节点,以及单词转换概率由音频流和视觉流一起确定。对于每个流,每个单词由其对应的电话组成,并且每个电话与观察特征(用于音频流和视觉流的视觉特征的音频特征)相关联,其由高斯混合模型建模的一些概率。与一般多流HMM比较,MS-ADBN模型描述了音频流和视觉流的Asynchrony到单词级别。实验结果在连续数字视觉数据库上显示:与多流肝脏相比,在不匹配噪声环境中,为MS-ADBN模型获得了10.07%的平均改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号