首页> 外文会议>IEEE International Conference on Acoustics, Speech, and Signal Processing >FRAME-DEPENDENT MULTI-STREAM RELIABILITY INDICATORS FOR AUDIO-VISUAL SPEECH RECOGNITION
【24h】

FRAME-DEPENDENT MULTI-STREAM RELIABILITY INDICATORS FOR AUDIO-VISUAL SPEECH RECOGNITION

机译:用于视听语音语音识别的帧相关的多流可靠性指示灯

获取原文

摘要

We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speech-class conditional observation probabilities of appropriate audio-or visual-only classifiers. We subsequently map the four reliability indicators into the stream exponents of a state-synchronous, two-stream hidden Markov model, as a sigmoid function of their linear combination. We propose two algorithms to estimate the sigmoid weights, based on the maximum conditional likelihood and minimum classification error criteria. We demonstrate the superiority of the proposed approach on a connected-digit audio-visual speech recognition task, under varying audio channel noise conditions. Indeed, the use of the estimated, frame-dependent stream exponents results in a significantly smaller word error rate than using global stream exponents. In addition, it outperforms utterance-level exponents, even though the latter utilize a-priori knowledge of the utterance noise level.
机译:我们研究了音频和视觉模型的本地帧相关可靠性指示符的使用,作为估计用于视听自动语音识别的多流隐马尔可夫模型的流指数的方法。我们考虑在每个模态处的两个这样的指示器,定义为适当的音频或视觉分类器的语音类条件观察概率的函数。我们随后将四个可靠性指示器映射到状态同步,双流隐马尔可夫模型的流指数中,作为其线性组合的SIGMOID函数。我们提出了两种算法来估计S形权重,基于最大条件可能性和最小分类误差标准。我们在不同的音频信道噪声条件下展示了在连接数字视听语音语音识别任务上提出了所提出的方法的优越性。实际上,使用估计的帧相关的流指数导致比使用全局流指令的单词误差率明显较小。此外,它占据了话语级别的指数,即使后者利用了对话语噪声水平的先验知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号