【24h】

Bimodal Speech Recognition Using Coupled Hidden Markov Models

机译:耦合隐马尔可夫模型的双峰语音识别

获取原文

摘要

In this paper we present a bimodal speech recognition system in which the audio and visual modalities are mdoeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities mdoeling temporal influences between their hidden state variables. The coupled through matrices of conditional probabilities modeling temporal influences between their hidden state variables. The coupling probabilities are both cross chain and cross time. the later is essential for allowing temporal influences between chains, which is important in modeling bimodal speech. Our bimodal speech recognition system employs a twoo-chain CHMM< with one chain being associated with the acoustic observation, the other with the visual features. A deterministic approxiamtion for maximum a posteriori (MAP) esttimation is used to enable fast classification and parameter estiamtion. We evaluted the system on a speaker independent connected-digit task. Comparing with an acoustic-only ASR sytem trained using only the audio channel of the same database, the bimodal system consistently demonstrates improved noise robustness at all SNRs. We further compare the CHMM system reported in this paper with our earlier bimodal speech recognition system in which the two modalities are fused by concatenating the audio and visual features. The recognition resutls clearly show the advantages of the CHMM framework in the context of bimodal speech recognition.
机译:在本文中,我们提出了一种双峰语音识别系统,其中使用耦合隐马尔可夫模型(CHMM)对音频和视觉模态进行了合并和集成。 CHMM是将Markov模型作为子图隐藏起来的概率推理图。相应推理图中的链通过条件概率矩阵耦合,该条件概率矩阵考虑了其隐藏状态变量之间的时间影响。通过条件概率矩阵耦合,可以对它们的隐藏状态变量之间的时间影响进行建模。耦合概率既是跨链的,又是跨时间的。后者对于允许链之间的时间影响至关重要,这在建模双峰语音中很重要。我们的双峰语音识别系统使用双向链CHMM <,其中一条链与声学观测相关联,另一条链与视觉特征相关联。最大后验(MAP)估计的确定性近似用于实现快速分类和参数估计。我们评估了该系统在说话者无关的连接数字任务上的作用。与仅使用同一数据库的音频通道训练的纯声学ASR系统相比,双峰系统始终显示出在所有SNR上均具有改进的噪声鲁棒性。我们进一步将本文报道的CHMM系统与我们较早的双峰语音识别系统进行比较,在该系统中,通过组合音频和视觉特征将两种模态融合在一起。识别结果清楚地显示了CHMM框架在双峰语音识别背景下的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号